Documentation

inference.club is a community-run inference network. Members run GPUs at home and expose their local model servers; everyone calls those models through one OpenAI-compatible API. One key works in both directions — to consume inference and to provide it.

Base URL   https://api.inference.club/v1
Auth       Authorization: Bearer <your-api-key>

Start here

Quickstart

Get a key and make your first request in five minutes.

For AI agents

Model discovery, routing, async, and workflows — written for automated callers.

Concepts

The vocabulary: users, keys, providers, agents, models, routing.

Become a provider

Share your own GPUs on the network with the Kubernetes agent.

What you can build with

The network serves far more than chat. Which modalities are live depends on what models the providers are running right now.

The playground

A browser tool for every modality — chat, agents, images, video, music, voice, and more.

Services & modalities

Each model type, the endpoint it answers, and the model behind it.

API reference

Every endpoint, request shape, and error code.

Async, batches & workflows

Fire-and-forget jobs, 256-item batches, and multi-step DAGs.

How it fits together

A small always-on cloud box is the control plane — the website, the API, auth, routing, billing, and async orchestration. The heavy GPU compute lives on home hardware that is never exposed to the internet. A private Tailscale tunnel bridges the two, and generated media is offloaded to object storage so the small box stays off the hot path.

Consumers

OpenAI SDKs · curl · browser · agents

HTTPS

Hetzner VPS · control plane

CaddyTLS · SSE

Nuxtfrontend

DjangoAPI · routing

Celeryasync jobs

Postgressource of truth

Rediscache · broker

Tailscale sidecarSOCKS5 · WireGuard out

Tailscale tailnet · MagicDNS · no port-forward

Home k3s cluster · GPU compute

inference-club-agent (kubernetes discovery) routes /v1/* to in-cluster services across 3× RTX 4090 + DGX Spark

vLLMLM StudioFlux.2LTX-2DiaACE-StepMagpie TTSNemotron ASRTRELLIS

Generated media (images · audio · video · 3D) is written to Google Cloud Storage and served straight from Google's edge — off the VPS hot path.

The agent reports what it serves every ~30 seconds, so the platform's model list and routing always reflect what is actually online. For the full story — request path, the home k3s cluster, the Hetzner deployment, and storage — see Architecture.