Skip to content
Docs
xAI logo

Grok TTS

Text-to-SpeechxAI

xAI's Grok text-to-speech model. Generates high-fidelity spoken audio in 5 expressive voices (eve, ara, rex, sal, leo) with 20+ supported languages. Supports inline speech tags for laughter, whispers, and pauses.

Model Info
Terms and Licenselink
More informationlink
PricingView pricing in the Cloudflare dashboard

Usage

TypeScript
const response = await env.AI.run(
'xai/grok-tts',
{ text: 'Hello! Welcome to the xAI Text to Speech API.', language: 'en' },
)
console.log(response)

Examples

Different Voice — Use the warm, conversational `ara` voice
TypeScript
const response = await env.AI.run(
'xai/grok-tts',
{ text: 'Thank you for calling. How can I help you today?', voice_id: 'ara', language: 'en' },
)
console.log(response)
High-Fidelity MP3 — 44.1 kHz / 192 kbps MP3 for production use
TypeScript
const response = await env.AI.run(
'xai/grok-tts',
{
text: 'Crystal clear audio at maximum quality.',
voice_id: 'rex',
language: 'en',
output_format: { codec: 'mp3', sample_rate: 44100, bit_rate: 192000 },
},
)
console.log(response)
Telephony (mulaw) — G.711 μ-law at 8 kHz for SIP / PSTN integration
TypeScript
const response = await env.AI.run(
'xai/grok-tts',
{
text: 'Hello, thank you for calling. How can I help you today?',
voice_id: 'ara',
language: 'en',
output_format: { codec: 'mulaw', sample_rate: 8000 },
},
)
console.log(response)
Expressive Delivery — Inline speech tags for laughter, pauses, and whispers
TypeScript
const response = await env.AI.run(
'xai/grok-tts',
{
text: 'So I walked in and [pause] there it was. [laugh] I honestly could not believe it! <whisper>It was a secret the whole time.</whisper>',
voice_id: 'eve',
language: 'en',
},
)
console.log(response)
Text Normalization — Convert written numbers and abbreviations to spoken form
TypeScript
const response = await env.AI.run(
'xai/grok-tts',
{
text: 'The total is $1,234.56 and the meeting is at 3pm on Jan 15th.',
voice_id: 'rex',
language: 'en',
text_normalization: true,
},
)
console.log(response)

Parameters

language
stringrequiredBCP-47 language code (e.g. "en", "zh", "pt-BR") or "auto" for automatic language detection. Required — xAI returns 400 if omitted. Supported codes: auto, en, ar-EG, ar-SA, ar-AE, bn, zh, fr, de, hi, id, it, ja, ko, pt-BR, pt-PT, ru, es-MX, es-ES, tr, vi.
text
stringrequiredmaxLength: 15000minLength: 1Text to convert to speech. Maximum 15,000 characters. Supports inline speech tags: [pause], [laugh], <whisper>…</whisper>, etc.
text_normalization
booleanWhen true, normalizes written-form text into spoken-form before synthesis (e.g. "Dr." → "Doctor", "100" → "one hundred"). Defaults to false.
voice_id
stringminLength: 1Voice for synthesis. Defaults to "eve". Built-in voices: eve (energetic), ara (warm), rex (confident), sal (balanced), leo (authoritative). Custom voice IDs from /v1/tts/voices are also accepted. Case-insensitive — "Eve", "EVE", and "eve" are equivalent.

API Schemas (Raw)

Input
Output