Welcome to inference.club
inference.club is a community-run inference network. Members run agents on their own hardware (a workstation with a GPU, a homelab, a rented box) that expose local LLM servers — vLLM, LM Studio, Ollama, anything OpenAI-compatible. Other members do inference against those agents through a single OpenAI-compatible API.
If you have an OpenAI client (the official SDK, Open WebUI, your own scripts, an agent harness), pointing it at https://api.inference.club/v1 with your API key is all you need to start.
Where to go from here
- Quickstart — get an API key, point a client at the API, run your first request. Five minutes.
- Concepts — the model in your head: users, providers, agents, models, API keys.
- API reference — the OpenAI-compatible endpoints, request shapes, error codes.
- Interactive API explorer — the OpenAPI/Swagger reference; authorize with your key and try requests live.
- Run an agent — install
inference-club-agent, point it at your local LLM server, register it with inference.club. - FAQ — common questions and gotchas.
How it works in one diagram
Your client (Open WebUI, OpenAI SDK, curl)
│ Authorization: Bearer <your-api-key>
▼
api.inference.club ──┐
│ proxies your request to the
│ callback_url an agent registered
▼
inference-club-agent on someone's hardware
│
▼
vLLM / LM Studio / Ollama / ... (the actual model)
The agent on the provider side and the inference.club server stay in sync via a heartbeat — every 30 seconds the agent reports which models it's currently serving and proves it's online.