暂无中文版本——显示英文。

Text to speech

Generate natural speech from text with any TTS model on the network. The endpoint mirrors OpenAI's speech API, so the official SDKs and curl work unchanged.

POST /v1/audio/speech

Requests route only to services a provider declared as type: tts. The response is the raw audio (just like OpenAI), and a copy is stored on inference.club so it shows up in your history.

Request

application/json:

FieldRequiredDescription
modelyesA tts model id from GET /v1/models.
inputyesThe text to synthesize.
voicenoA voice name (see voices). Defaults to the provider's default.
response_formatnowav (default) or opus.
languagenoLanguage hint, e.g. en-US (the model is multilingual).

curl

curl https://api.inference.club/v1/audio/speech \
  -H "Authorization: Bearer $INFERENCE_CLUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "<your-tts-model>", "input": "Hello from inference club", "voice": "en-US-female-1" }' \
  --output speech.wav

Python (openai SDK)

from openai import OpenAI

client = OpenAI(base_url="https://api.inference.club/v1", api_key="<your-api-key>")
with client.audio.speech.with_streaming_response.create(
    model="<your-tts-model>",
    voice="en-US-female-1",
    input="Hello from inference club",
) as response:
    response.stream_to_file("speech.wav")

Response

The raw audio bytes, with Content-Type: audio/wav (or audio/ogg for Opus). Metered by the duration of the generated audio.

Voices

Voices are model-specific. List what a model offers:

GET /v1/audio/voices?model=<model-id>
{ "voices": ["Magpie-Multilingual.EN-US.Mia", "Magpie-Multilingual.EN-US.Jason", "…"] }

(/v1/audio/voices is an inference.club extension, not part of OpenAI's API.) The in-dashboard Speech playground populates a voice dropdown from this.

Voice cloning

For expressive multi-speaker dialogue with voice cloning, see POST /v1/voice/generations. That endpoint routes to providers running Dia and accepts [S1]/[S2] speaker-tagged scripts with optional voice samples from your library.

Notes

  • Formats: providers typically return WAV natively; we also accept opus as a response_format. mp3/aac/flac aren't transcoded — a request for those returns WAV.
  • Speed and other OpenAI parameters not supported by the provider are ignored.

Errors

typeWhenHTTP
missing_inputNo input text400
request_too_largeInput text over the limit413
no_providerNo online TTS provider serves the model for you404
upstream_errorThe provider's speech server failed502