Cómo encaja todo
GPUs caseras, una red privada.
inference.club is a Tailscale tailnet that joins consumer hardware — RTX PCs, the DGX Spark, Apple silicon — so members can safely expose their inference through one unified API, across the whole range of AI modalities: chat, images, video, speech, music, 3D.
Generaciones destacadas
Hecho en la red
Solicitudes reales de miembros, generadas en hardware que la comunidad aporta a la red. Haz clic en cualquier tarjeta para ver la solicitud completa.
“Type text, pick a voice, and synthesize natural speech.”
summarize this article: AI Outperforms Law Professors in Stanford Law Study In a rigorous blind study, law professors overwhelmingly preferred AI-generated answers to student legal questions over answers written by fellow law professors—and flagged the AI answers as potentially m…
**Summary** A blind study led by Stanford Law professor Julian Nyarko found that law professors overwhelmingly preferred AI‑generated answers to contract‑law questions over answers written by their fellow professors. In a head‑to‑head comparison of nearly 3,000 anonymized respon…



disco funk vibes party
Sustituto directo del SDK de OpenAI.
Regístrate, genera un token y apunta tu cliente a api.inference.club/v1. Cada modelo de la red está a una sola petición de distancia.
export OPENAI_API_KEY=ic_xxxxxxxxxxxxxxxxxxxx
export OPENAI_BASE_URL=https://api.inference.club/v1
curl $OPENAI_BASE_URL/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3.6-27b",
"messages": [
{"role": "user", "content": "explain MoE in one sentence"}
]
}'Envuelve cualquier servidor compatible con OpenAI.
¿Ya ejecutas vllm, llama.cpp u ollama? Apunta el agente hacia él y ya eres un nodo de la red.
# Already running vLLM, llama.cpp, or Ollama on your GPU?
# Point the agent at it and join the network.
export INFERENCE_CLUB_API_KEY=ic_xxxxxxxxxxxxxxxxxxxx
export OPENAI_BASE_URL=http://localhost:8000/v1
export OPENAI_API_KEY=local-key # whatever your local server expects
docker run --rm -d --name club-agent --network host \
-e INFERENCE_CLUB_API_KEY \
-e OPENAI_BASE_URL \
-e OPENAI_API_KEY \
ghcr.io/inference-club/inference-club-agent:latestArquitectura
Tres piezas. Nada de magia.
Sigue una sola petición desde tu código hasta una GPU y de vuelta. El plano de control en la nube autentica, aplica tus reglas de privacidad y enruta, pero el modelo se ejecuta en hardware que es tuyo.
Los operadores ejecutan agentes
Los miembros ejecutan inference-club-agent junto a su servidor LLM local. El agente anuncia los modelos que el servidor esté alojando.
Los agentes se unen a la tailnet
Cada agente recibe una clave de Tailscale de corta duración y se une a nuestra malla privada. Sin endpoints públicos. Sin redirección de puertos. Solo WireGuard.
Los consumidores envían peticiones
Las llamadas a api.inference.club se enrutan a un agente en línea que sirve el modelo solicitado. El streaming funciona. La latencia es directa.
Your application
curl · OpenAI SDK · the Playground · your agents
api_key = "ic_xxxxxxxxxxxxxxxxxxxx"
api.inference.club — the control plane
one small cloud VPS (Hetzner). It routes; it never runs the model.
Caddy
TLS · reverse proxy
Django + DRF
OpenAI-compatible /v1 router · auth · routing
Access control
visibility · per-service ACLs · kill switch
Celery workers
async jobs · batches · workflow DAG
Postgres + Redis
state · queue · throttling
GCS
images · video · voice · music
The inference.club tailnet
a private Tailscale mesh — pure WireGuard
Your rig — where inference actually happens
a GPU you own, at home, on hardware you trust
inference-club-agentcontainer · --network host Joins the tailnet with its minted key, advertises models from agent.yaml, and forwards each request to whatever you already run locally:
→ http://localhost:1234/v1
Follow one request
- 1Your code calls api.inference.club/v1 with your ic_ key — the same request you’d send OpenAI.
- 2Caddy terminates TLS; Django authenticates the key and applies your privacy + access rules.
- 3The router picks a healthy, online node that actually serves the requested model.
- 4Django (via a Tailscale SOCKS5 sidecar) dials the node by MagicDNS over WireGuard — no ports, no tunnels.
- 5The agent container hands the request to your local LLM server on localhost.
- 6Tokens (or images / video / audio) stream back along the exact same path.
In one breath
“inference.club is what happens when you point an OpenAI-compatible API at a pile of consumer GPUs you actually own and trust — a private Tailscale tailnet quietly stitching a 4090 here, an M3 Ultra there, a DGX Spark and a couple of 3090s into one WireGuard mesh with no ports forwarded and no firewall holes, where a littleinference-club-agentcontainer sits next to whatever you’re already running — vLLM, llama.cpp, Ollama, LM Studio — and advertises it through a manifest, while back in the cloud a Django + Celery server behind Caddy authenticates youric_key, enforces your privacy and per-service access controls, and routes the call over the tailnet by MagicDNS to a healthy online node, with Redis and Postgres driving async jobs, batches and a whole workflow DAG engine, GCS holding the images, video, voice and music that come back, a Nuxt playground and dashboard to poke at all of it, the home fleet itself migrating from Docker to k3s, and the entire thing — chat, images, LTX-2 video, Dia voice cloning, speech, the works — sitting behind one base URL you can curl, so go ahead and build something, and, as the prompt says: Make no mistakes.”
Por qué inference.club
Diseñado para la forma en que los modelos abiertos se ejecutan realmente.
Compatible con OpenAI
Sustituto directo. Cambia la URL base y la clave — tus SDKs y prompts existentes simplemente funcionan.
GPUs reales, modelos reales
Los miembros sirven modelos de pesos abiertos en su propio hardware: Qwen, Llama, DeepSeek, Mistral, Gemma.
Privado por defecto
Las peticiones llegan a los proveedores a través de Tailscale, cifradas de extremo a extremo. Sin endpoints públicos que rastrear.
Un club, no un proveedor
Comparte cómputo con gente de confianza. Aporta un nodo cuando tengas ciclos libres. Usa la red cuando los necesites.
Del blog
Todas las entradasDestacado
From docker sprawl to k3s: rebuilding my home inference fleet
A 'healthy' mesh-generation service sat wedged for three days while my agent.yaml described services that didn't exist. So I moved four GPU boxes — three RTX 4090s and a DGX Spark — onto k3s and taught the inference-club-agent to discover services from the Kubernetes API instead of a config file. Health checks lie; queues don't. Config is fiction; clusters are testimony.