Grok TTS
Text-to-Speech • xAIxAI's Grok text-to-speech model. Generates high-fidelity spoken audio in 5 expressive voices (eve, ara, rex, sal, leo) with 20+ supported languages. Supports inline speech tags for laughter, whispers, and pauses.
| Model Info | |
|---|---|
| Terms and License | link ↗ |
| More information | link ↗ |
| Pricing | View pricing in the Cloudflare dashboard ↗ |
Usage
const response = await env.AI.run( 'xai/grok-tts', { text: 'Hello! Welcome to the xAI Text to Speech API.', language: 'en' },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "xai/grok-tts", "input": { "text": "Hello! Welcome to the xAI Text to Speech API.", "language": "en" }}'{ "state": "Completed", "result": { "audio": "https://examples.aig.cloudflare.com/xai/grok-tts/simple-generation.mp3" }, "gatewayMetadata": { "keySource": "Unified" }}Examples
Different Voice — Use the warm, conversational `ara` voice
const response = await env.AI.run( 'xai/grok-tts', { text: 'Thank you for calling. How can I help you today?', voice_id: 'ara', language: 'en' },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "xai/grok-tts", "input": { "text": "Thank you for calling. How can I help you today?", "voice_id": "ara", "language": "en" }}'{ "state": "Completed", "result": { "audio": "https://examples.aig.cloudflare.com/xai/grok-tts/different-voice.mp3" }, "gatewayMetadata": { "keySource": "Unified" }}High-Fidelity MP3 — 44.1 kHz / 192 kbps MP3 for production use
const response = await env.AI.run( 'xai/grok-tts', { text: 'Crystal clear audio at maximum quality.', voice_id: 'rex', language: 'en', output_format: { codec: 'mp3', sample_rate: 44100, bit_rate: 192000 }, },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "xai/grok-tts", "input": { "text": "Crystal clear audio at maximum quality.", "voice_id": "rex", "language": "en", "output_format": { "codec": "mp3", "sample_rate": 44100, "bit_rate": 192000 } }}'{ "state": "Completed", "result": { "audio": "https://examples.aig.cloudflare.com/xai/grok-tts/high-fidelity-mp3.mp3" }, "gatewayMetadata": { "keySource": "Unified" }}Telephony (mulaw) — G.711 μ-law at 8 kHz for SIP / PSTN integration
const response = await env.AI.run( 'xai/grok-tts', { text: 'Hello, thank you for calling. How can I help you today?', voice_id: 'ara', language: 'en', output_format: { codec: 'mulaw', sample_rate: 8000 }, },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "xai/grok-tts", "input": { "text": "Hello, thank you for calling. How can I help you today?", "voice_id": "ara", "language": "en", "output_format": { "codec": "mulaw", "sample_rate": 8000 } }}'{ "state": "Completed", "result": { "audio": "https://examples.aig.cloudflare.com/xai/grok-tts/telephony-law.mp3" }, "gatewayMetadata": { "keySource": "Unified" }}Expressive Delivery — Inline speech tags for laughter, pauses, and whispers
const response = await env.AI.run( 'xai/grok-tts', { text: 'So I walked in and [pause] there it was. [laugh] I honestly could not believe it! <whisper>It was a secret the whole time.</whisper>', voice_id: 'eve', language: 'en', },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "xai/grok-tts", "input": { "text": "So I walked in and [pause] there it was. [laugh] I honestly could not believe it! <whisper>It was a secret the whole time.</whisper>", "voice_id": "eve", "language": "en" }}'{ "state": "Completed", "result": { "audio": "https://examples.aig.cloudflare.com/xai/grok-tts/expressive-delivery.mp3" }, "gatewayMetadata": { "keySource": "Unified" }}Text Normalization — Convert written numbers and abbreviations to spoken form
const response = await env.AI.run( 'xai/grok-tts', { text: 'The total is $1,234.56 and the meeting is at 3pm on Jan 15th.', voice_id: 'rex', language: 'en', text_normalization: true, },)console.log(response)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "xai/grok-tts", "input": { "text": "The total is $1,234.56 and the meeting is at 3pm on Jan 15th.", "voice_id": "rex", "language": "en", "text_normalization": true }}'{ "state": "Completed", "result": { "audio": "https://examples.aig.cloudflare.com/xai/grok-tts/text-normalization.mp3" }, "gatewayMetadata": { "keySource": "Unified" }}Parameters
language
stringrequiredBCP-47 language code (e.g. "en", "zh", "pt-BR") or "auto" for automatic language detection. Required — xAI returns 400 if omitted. Supported codes: auto, en, ar-EG, ar-SA, ar-AE, bn, zh, fr, de, hi, id, it, ja, ko, pt-BR, pt-PT, ru, es-MX, es-ES, tr, vi.▶optimize_streaming_latency
one of▶output_format{}
objectOutput audio format. Defaults to MP3 at 24 kHz / 128 kbps when omitted.text
stringrequiredmaxLength: 15000minLength: 1Text to convert to speech. Maximum 15,000 characters. Supports inline speech tags: [pause], [laugh], <whisper>…</whisper>, etc.text_normalization
booleanWhen true, normalizes written-form text into spoken-form before synthesis (e.g. "Dr." → "Doctor", "100" → "one hundred"). Defaults to false.voice_id
stringminLength: 1Voice for synthesis. Defaults to "eve". Built-in voices: eve (energetic), ara (warm), rex (confident), sal (balanced), leo (authoritative). Custom voice IDs from /v1/tts/voices are also accepted. Case-insensitive — "Eve", "EVE", and "eve" are equivalent.audio
stringPresigned R2 URL for the generated audio file. MIME type reflects the requested codec (audio/mpeg for mp3, audio/wav for wav, etc.).