它们如何协作
家用 GPU,一个 私有网络.
inference.club is a Tailscale tailnet that joins consumer hardware — RTX PCs, the DGX Spark, Apple silicon — so members can safely expose their inference through one unified API, across the whole range of AI modalities: chat, images, video, speech, music, 3D.
精选生成内容
诞生于 网络之上
来自成员的真实请求,由社区贡献的硬件生成。点击任意卡片查看完整请求。
“Type text, pick a voice, and synthesize natural speech.”
summarize this article: AI Outperforms Law Professors in Stanford Law Study In a rigorous blind study, law professors overwhelmingly preferred AI-generated answers to student legal questions over answers written by fellow law professors—and flagged the AI answers as potentially m…
**Summary** A blind study led by Stanford Law professor Julian Nyarko found that law professors overwhelmingly preferred AI‑generated answers to contract‑law questions over answers written by their fellow professors. In a head‑to‑head comparison of nearly 3,000 anonymized respon…



disco funk vibes party
OpenAI SDK 即插即用。
注册、生成令牌,把客户端指向 api.inference.club/v1。网络上的每个模型都触手可及。
export OPENAI_API_KEY=ic_xxxxxxxxxxxxxxxxxxxx
export OPENAI_BASE_URL=https://api.inference.club/v1
curl $OPENAI_BASE_URL/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3.6-27b",
"messages": [
{"role": "user", "content": "explain MoE in one sentence"}
]
}'封装任意兼容 OpenAI 的服务器。
已经在运行 vllm、llama.cpp 或 ollama?把 agent 指向它,你就成了网络中的一个节点。
# Already running vLLM, llama.cpp, or Ollama on your GPU?
# Point the agent at it and join the network.
export INFERENCE_CLUB_API_KEY=ic_xxxxxxxxxxxxxxxxxxxx
export OPENAI_BASE_URL=http://localhost:8000/v1
export OPENAI_API_KEY=local-key # whatever your local server expects
docker run --rm -d --name club-agent --network host \
-e INFERENCE_CLUB_API_KEY \
-e OPENAI_BASE_URL \
-e OPENAI_API_KEY \
ghcr.io/inference-club/inference-club-agent:latest架构
三个部分。 毫无玄机。
跟随一个请求从你的代码到 GPU 再返回。云端控制平面负责认证、应用你的隐私规则并进行路由——但模型本身运行在你自己拥有的硬件上。
运营者运行 agent
成员在本地 LLM 服务器旁运行 inference-club-agent。agent 会公布该服务器所托管的全部模型。
agent 加入 tailnet
每个 agent 获得一个短期有效的 Tailscale 密钥,加入我们的私有 mesh 网络。没有公开端点,无需端口转发,只有 WireGuard。
用户发送请求
发往 api.inference.club 的调用会路由到提供所请求模型的在线 agent。支持流式传输,延迟直连。
Your application
curl · OpenAI SDK · the Playground · your agents
api_key = "ic_xxxxxxxxxxxxxxxxxxxx"
api.inference.club — the control plane
one small cloud VPS (Hetzner). It routes; it never runs the model.
Caddy
TLS · reverse proxy
Django + DRF
OpenAI-compatible /v1 router · auth · routing
Access control
visibility · per-service ACLs · kill switch
Celery workers
async jobs · batches · workflow DAG
Postgres + Redis
state · queue · throttling
GCS
images · video · voice · music
The inference.club tailnet
a private Tailscale mesh — pure WireGuard
Your rig — where inference actually happens
a GPU you own, at home, on hardware you trust
inference-club-agentcontainer · --network host Joins the tailnet with its minted key, advertises models from agent.yaml, and forwards each request to whatever you already run locally:
→ http://localhost:1234/v1
Follow one request
- 1Your code calls api.inference.club/v1 with your ic_ key — the same request you’d send OpenAI.
- 2Caddy terminates TLS; Django authenticates the key and applies your privacy + access rules.
- 3The router picks a healthy, online node that actually serves the requested model.
- 4Django (via a Tailscale SOCKS5 sidecar) dials the node by MagicDNS over WireGuard — no ports, no tunnels.
- 5The agent container hands the request to your local LLM server on localhost.
- 6Tokens (or images / video / audio) stream back along the exact same path.
In one breath
“inference.club is what happens when you point an OpenAI-compatible API at a pile of consumer GPUs you actually own and trust — a private Tailscale tailnet quietly stitching a 4090 here, an M3 Ultra there, a DGX Spark and a couple of 3090s into one WireGuard mesh with no ports forwarded and no firewall holes, where a littleinference-club-agentcontainer sits next to whatever you’re already running — vLLM, llama.cpp, Ollama, LM Studio — and advertises it through a manifest, while back in the cloud a Django + Celery server behind Caddy authenticates youric_key, enforces your privacy and per-service access controls, and routes the call over the tailnet by MagicDNS to a healthy online node, with Redis and Postgres driving async jobs, batches and a whole workflow DAG engine, GCS holding the images, video, voice and music that come back, a Nuxt playground and dashboard to poke at all of it, the home fleet itself migrating from Docker to k3s, and the entire thing — chat, images, LTX-2 video, Dia voice cloning, speech, the works — sitting behind one base URL you can curl, so go ahead and build something, and, as the prompt says: Make no mistakes.”
为什么选择 inference.club
为契合 开源模型 的真实运行方式而打造。
兼容 OpenAI
即插即用替代方案。换上基础 URL 和密钥——你现有的 SDK 和提示词即可正常工作。
真实 GPU,真实模型
成员在自己的硬件上提供开放权重模型:Qwen、Llama、DeepSeek、Mistral、Gemma。
默认私有
请求通过 Tailscale 抵达提供者,端到端加密。没有可被抓取的公开端点。
是俱乐部,不是供应商
与你信任的人共享算力。有空闲算力时接入节点,需要时使用整个网络。
来自博客
全部文章精选
From docker sprawl to k3s: rebuilding my home inference fleet
A 'healthy' mesh-generation service sat wedged for three days while my agent.yaml described services that didn't exist. So I moved four GPU boxes — three RTX 4090s and a DGX Spark — onto k3s and taught the inference-club-agent to discover services from the Kubernetes API instead of a config file. Health checks lie; queues don't. Config is fiction; clusters are testimony.