# Speech-02-Turbo > Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency - **Provider**: replicate - **Model ID**: minimax/speech-02-turbo - **Category**: tts_voice - **Credits**: 108 per request - **Pricing Type**: token_based ## API Endpoint Base URL: https://api.core.today/v1 ### Create Prediction POST /predictions ### Get Status GET /predictions/{job_id} ### Cancel DELETE /predictions/{job_id} ## Authentication Header: `X-API-Key: YOUR_API_KEY` ## Input Parameters - `voice_id` (string, optional): Voice to synthesize. Pick any MiniMax system voice or a voice_id returned by https://replicate.com/minimax/voice-cloning. - `channel` (string, optional): mono for 1 channel (default), stereo for 2 channels. - `audio_format` (string, optional): File format for the generated audio. Choose mp3 for general use, wav/flac for lossless, or pcm for raw bytes. - `english_normalization` (boolean, optional): Improve number/date reading for English text (adds a small amount of latency). - `bitrate` (integer, optional): MP3 bitrate in bits per second. Only used when audio_format is mp3. - `speed` (number, optional): Speech speed multiplier (0.5–2.0). Lower is slower, higher is faster. (Range: min: 0.5, max: 2) - `language_boost` (string, optional): Optional language hint. Choose Automatic to let MiniMax detect the language, or pick a specific locale. - `subtitle_enable` (boolean, optional): Return MiniMax subtitle metadata with sentence timestamps (non-streaming only). - `volume` (number, optional): Relative loudness. 1.0 is default MiniMax gain. Range 0–10. (Range: min: 0, max: 10) - `emotion` (string, optional): Desired delivery style. Use auto to let MiniMax choose, or pick a specific emotion. - `sample_rate` (integer, optional): Audio sample rate in Hz. - `text` (string, **required**): Text to narrate (max 10,000 characters). Use markers like <#0.5#> to insert pauses in seconds. - `pitch` (integer, optional): Semitone offset applied to the voice (−12 to +12). (Range: min: -12, max: 12) ## Example Request ```json { "model": "minimax/speech-02-turbo", "input": { "volume": 1, "emotion": "angry", "sample_rate": 32000, "voice_id": "Deep_Voice_Man", "channel": "mono", "english_normalization": true, "bitrate": 128000, "text": "Speech-02-series is a Text-to-Audio and voice cloning technology that offers voice synthesis, emotional expression, and multilingual capabilities.\n\nThe HD version is optimized for high-fidelity applications like voiceovers and audiobooks. While the turbo one is designed for real-time applications with low latency.\n\nWhen using this model on Replicate, each character represents 1 token.", "pitch": 0, "speed": 1, "language_boost": "English" } } ``` ## Response Format ```json { "job_id": "abc123", "status": "pending", "provider": "replicate", "model": "black-forest-labs/flux-schnell", "created_at": "2026-01-01T00:00:00Z", "result": null, "error": null } ``` Status values: `pending`, `processing`, `completed`, `failed`, `cancelled` ## Usage Flow 1. POST /predictions with model and input → receive job_id 2. Poll GET /predictions/{job_id} until status is `completed` or `failed` 3. Result contains output URL(s) or data ## Output Type url ## Tags text-to-speech, tts, voice-synthesis, voice-cloning, multilingual, emotion-control, real-time, low-latency, minimax ## Documentation https://replicate.com/minimax/speech-02-turbo ## Token Pricing - Input: 0.108 credits/token - Output: 0.108 credits/token