live atapi.inference.club

A distributed inference network
powered by consumer GPUs and Tailscale

One OpenAI-compatible endpoint backed by GPUs that members bring to the network. Run an agent on your own hardware. Use the whole pool through one key.

GitHub sign-in · Free during beta · No credit card

How it fits together

Home GPUs, one private network.

For consumers

Drop-in for the OpenAI SDK.

Sign up, mint a token, point your client at api.inference.club/v1. Every model on the network is one request away.

export OPENAI_API_KEY=ic_xxxxxxxxxxxxxxxxxxxx
export OPENAI_BASE_URL=https://api.inference.club/v1

curl $OPENAI_BASE_URL/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.6-27b",
    "messages": [
      {"role": "user", "content": "explain MoE in one sentence"}
    ]
  }'
For providers

Wrap any OpenAI-compatible server.

Already running vllm, llama.cpp, or ollama? Point the agent at it and you're a node on the network.

# Already running vLLM, llama.cpp, or Ollama on your GPU?
# Point the agent at it and join the network.

export INFERENCE_CLUB_API_KEY=ic_xxxxxxxxxxxxxxxxxxxx
export OPENAI_BASE_URL=http://localhost:8000/v1
export OPENAI_API_KEY=local-key  # whatever your local server expects

docker run --rm -d --name club-agent --network host \
  -e INFERENCE_CLUB_API_KEY \
  -e OPENAI_BASE_URL \
  -e OPENAI_API_KEY \
  ghcr.io/inference-club/inference-club-agent:latest

Architecture

Three pieces. Nothing magic.

step01

Operators run agents

Members run inference-club-agent next to their local LLM server. The agent advertises whatever models the server is hosting.

step02

Agents join the tailnet

Each agent receives a short-lived Tailscale key and joins our private mesh. No public endpoints. No port forwarding. Just WireGuard.

step03

Consumers send requests

Calls to api.inference.club route to an online agent serving the requested model. Streaming works. Latency is direct.

brian's 4090m3 ultra · 192gb2× 3090 riginference.clubprivate tailnetapi.inference.clubyour code

Why inference.club

Built for the way open models actually run.

OpenAI-compatible

Drop-in replacement. Swap the base URL and key — your existing SDKs and prompts just work.

Real GPUs, real models

Members serve open-weight models on their own hardware: Qwen, Llama, DeepSeek, Mistral, Gemma.

Private by default

Requests reach providers over Tailscale, end-to-end encrypted. No public endpoints to scrape.

A club, not a vendor

Pool compute with people you trust. Bring a node when you have spare cycles. Use the network when you need them.

Ready to plug in?

Sign in with GitHub, mint a key, and you're live in under a minute. Bring a node whenever you have spare cycles.