Aún no disponible en Español — mostrando en inglés.

Text to speech

Generate natural speech from text with any TTS model on the network. The endpoint mirrors OpenAI's speech API, so the official SDKs and curl work unchanged.

POST /v1/audio/speech

Requests route only to services a provider declared as type: tts. The response is the raw audio (just like OpenAI), and a copy is stored on inference.club so it shows up in your history.

Request

application/json:

FieldRequiredDescription
modelyesA tts model id from GET /v1/models.
inputyesThe text to synthesize.
voicenoA voice name (see voices). Defaults to the provider's default.
response_formatnowav (default) or opus.
languagenoLanguage hint, e.g. en-US (the model is multilingual).

curl

curl https://api.inference.club/v1/audio/speech \
  -H "Authorization: Bearer $INFERENCE_CLUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "<your-tts-model>", "input": "Hello from inference club", "voice": "en-US-female-1" }' \
  --output speech.wav

Python (openai SDK)

from openai import OpenAI

client = OpenAI(base_url="https://api.inference.club/v1", api_key="<your-api-key>")
with client.audio.speech.with_streaming_response.create(
    model="<your-tts-model>",
    voice="en-US-female-1",
    input="Hello from inference club",
) as response:
    response.stream_to_file("speech.wav")

Response

The raw audio bytes, with Content-Type: audio/wav (or audio/ogg for Opus). Metered by the duration of the generated audio.

Voices

Voices are model-specific. List what a model offers:

GET /v1/audio/voices?model=<model-id>
{ "voices": ["Magpie-Multilingual.EN-US.Mia", "Magpie-Multilingual.EN-US.Jason", "…"] }

(/v1/audio/voices is an inference.club extension, not part of OpenAI's API.) The in-dashboard Speech playground populates a voice dropdown from this.

Voice cloning

For expressive multi-speaker dialogue with voice cloning, see POST /v1/voice/generations. That endpoint routes to providers running Dia and accepts [S1]/[S2] speaker-tagged scripts with optional voice samples from your library.

Notes

  • Formats: providers typically return WAV natively; we also accept opus as a response_format. mp3/aac/flac aren't transcoded — a request for those returns WAV.
  • Speed and other OpenAI parameters not supported by the provider are ignored.

Errors

typeWhenHTTP
missing_inputNo input text400
request_too_largeInput text over the limit413
no_providerNo online TTS provider serves the model for you404
upstream_errorThe provider's speech server failed502