BR

briancaffey

@briancaffey

joined April 2026

club-host

online 3D cluster

a1

a1 · 192.168.5.253

GPUNVIDIA
Otherdia

http://dia.inference-club.svc.cluster.local:8491

Dia 1.6B (voice cloning) playground
Text Audio voice-cloning dialogue
command
ghcr.io/inference-club/dia:latest

a2

a2 · 192.168.5.96

GPUNVIDIA
Otherflux2-klein

http://flux2-klein.inference-club.svc.cluster.local:8000/v1

FLUX.2 Klein 4B playground
Text Image
command
ghcr.io/inference-club/flux2-klein:v0.1

a3

a3 · 192.168.5.173

GPUNVIDIA
Otherltx2

http://ltx2.inference-club.svc.cluster.local:8023

Text Image
command
ghcr.io/inference-club/ltx2-server:v0.1 uvicorn ltx_server.app:app --host 0.0.0.0 --port 8023

external-lmstudio

192.168.6.19

external endpoint (outside the cluster)

LM Studiolmstudio

http://lmstudio.inference-club.svc.cluster.local:1234/v1

spark

spark-d2ce · 192.168.6.19

GPUNVIDIA
vLLMnemotron-omni

http://nemotron-omni.inference-club.svc.cluster.local:8000/v1

Nemotron 3 Nano Omni 30B playground
10K ctx Text Image Audio Video Reasoning Tools
command
vllm/vllm-openai:nightly bash -c pip install -q nvidia-cuda-runtime-cu12 'vllm[audio]' && \
export LD_LIBRARY_PATH=/usr/local/lib/python3.12/dist-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH && \
vllm serve nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4 \
  --served-model-name nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4 nemotron-3-nano-omni \
  --tensor-parallel-size 1 \
  --port 8000 --host 0.0.0.0 \
  --max-model-len 10000 \
  --max-num-seqs 128 \
  --max-num-batched-tokens 32768 \
  --gpu-memory-utilization 0.75 \
  --quantization fp4 \
  --moe-backend marlin \
  --kv-cache-dtype fp8 \
  --mamba-ssm-cache-dtype float32 \
  --enable-prefix-caching \
  --reasoning-parser nemotron_v3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --video-pruning-rate 0.5 \
  --limit-mm-per-prompt '{"video":1,"image":1,"audio":1}' \
  --media-io-kwargs '{"video":{"fps":2,"num_frames":256}}' \
  --allowed-local-media-path / \
  --trust-remote-code

Inference requests

@briancaffey hasn't made any inference requests yet.

Compute provided

209 requests served · 121,420 tokens

Inference used

209 requests · 121,420 tokens (73,676 in / 47,744 out)

Models @briancaffey is serving

Run them free in the playground or from your own code via the OpenAI-compatible API.

curl https://api.inference.club/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCE_CLUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dia-1.6b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'