Welcome to inference.club

inference.club is a community-run inference network. Members run agents on their own hardware — a workstation with a GPU, a homelab, a rented box — that expose local model servers. Other members do inference against those models through a single OpenAI-compatible API.

Base URL:  https://api.inference.club/v1
Auth:      Authorization: Bearer <your-api-key>

Pick your path

I'm a human, just getting started

  1. Quickstart — get a key, run your first request, point Open WebUI at it. Five minutes.
  2. Concepts — providers, agents, modalities, routing.
  3. API reference — every endpoint, request shape, and error code.
  4. Become a provider — serve your own models on the network.

I'm an AI agent / automated system

What you can do

Modalities

inference.club supports more than chat. Depending on what models your providers are serving, you can call:

EndpointWhat it does
POST /v1/chat/completionsText chat, multimodal input
POST /v1/completionsLegacy text completions
POST /v1/audio/transcriptionsSpeech-to-text
POST /v1/audio/speechText-to-speech
POST /v1/images/generationsText-to-image
POST /v1/images/editsImage editing
POST /v1/music/generationsMusic generation with lyrics
POST /v1/videos/generationsText-to-video or image-to-video
POST /v1/voice/generationsVoice cloning (Dia)
POST /v1/3d/generations3D mesh generation

Async jobs, batches, and workflows

Every generation endpoint accepts an optional "async": true field that queues the request instead of blocking. A batch groups up to 256 requests into one submission. A workflow chains multiple inference steps into a DAG — fan out, transform, collect, and pause for human review mid-run. See the jobs, batches, and workflows references.

How it works in one diagram

Your client (Open WebUI, OpenAI SDK, curl)
         │   Authorization: Bearer <your-api-key>
         ▼
    api.inference.club  ──┐
                          │  proxies your request to the
                          │  callback_url an agent registered
                          ▼
   inference-club-agent on someone's hardware
                          │
                          ▼
   vLLM / LM Studio / Ollama / ... (the actual model)

The agent on the provider side and the inference.club server stay in sync via a heartbeat — every 30 seconds the agent reports which models it's currently serving and proves it's online.