Skip to content
Docs
MiniMax logo

MiniMax Speech 2.8 HD

Text-to-SpeechMiniMax

MiniMax Speech 2.8 HD focuses on studio-grade audio generation with emotion control, multilingual support (40+ languages), and voice cloning.

Model Info
Terms and Licenselink
More informationlink
PricingView pricing in the Cloudflare dashboard

Usage

TypeScript
const response = await env.AI.run(
'minimax/speech-2.8-hd',
{
format: 'mp3',
pitch: 0,
speed: 1,
text: 'Hello! Welcome to Cloudflare AI Gateway. Let me show you what we can do.',
voice_id: 'English_expressive_narrator',
volume: 1,
},
)
console.log(response)

Examples

Custom Voice — Use a specific voice and adjust speed
TypeScript
const response = await env.AI.run(
'minimax/speech-2.8-hd',
{
format: 'mp3',
pitch: 0,
speed: 0.9,
text: 'The weather today is sunny with a high of 72 degrees. Perfect for a walk in the park.',
voice_id: 'English_expressive_narrator',
volume: 1,
},
)
console.log(response)
With Emotion — Apply emotional tone to speech
TypeScript
const response = await env.AI.run(
'minimax/speech-2.8-hd',
{
emotion: 'happy',
format: 'mp3',
pitch: 0,
speed: 1,
text: "Congratulations! You've just won the grand prize! This is absolutely incredible news!",
voice_id: 'English_expressive_narrator',
volume: 1,
},
)
console.log(response)
High Sample Rate — Studio quality at 44.1kHz sample rate
TypeScript
const response = await env.AI.run(
'minimax/speech-2.8-hd',
{
format: 'mp3',
pitch: 0,
sample_rate: 44100,
speed: 1,
text: 'This recording is generated at studio quality sample rate for the highest possible audio fidelity.',
voice_id: 'English_expressive_narrator',
volume: 1,
},
)
console.log(response)

Parameters

emotion
stringenum: happy, sad, angry, fearful, disgusted, surprised, calm, fluentEmotion control for synthesized speech
format
stringrequireddefault: mp3enum: mp3, flac, wavOutput audio format
pitch
integerrequireddefault: 0maximum: 12minimum: -12Pitch adjustment (-12 to 12)
speed
numberrequireddefault: 1maximum: 2minimum: 0.5Speech speed (0.5 to 2)
text
stringrequiredmaxLength: 10000The text to convert to speech. Maximum 10,000 characters.
voice_id
stringrequireddefault: English_expressive_narratorThe voice ID to use for synthesis
volume
numberrequireddefault: 1maximum: 10minimum: 0Speech volume (0 to 10)

API Schemas (Raw)

Input
Output