a1
a1 · 192.168.5.253
GPUNVIDIA
Otherdia
http://dia.inference-club.svc.cluster.local:8491
Dia 1.6B (voice cloning) playground
Text Audio voice-cloning dialogue
command
ghcr.io/inference-club/dia:latest
a1 · 192.168.5.253
http://dia.inference-club.svc.cluster.local:8491
ghcr.io/inference-club/dia:latest
a2 · 192.168.5.96
http://flux2-klein.inference-club.svc.cluster.local:8000/v1
ghcr.io/inference-club/flux2-klein:v0.1
a3 · 192.168.5.173
http://ltx2.inference-club.svc.cluster.local:8023
ghcr.io/inference-club/ltx2-server:v0.1 uvicorn ltx_server.app:app --host 0.0.0.0 --port 8023
192.168.6.19
external endpoint (outside the cluster)
http://lmstudio.inference-club.svc.cluster.local:1234/v1
spark-d2ce · 192.168.6.19
http://nemotron-omni.inference-club.svc.cluster.local:8000/v1
vllm/vllm-openai:nightly bash -c pip install -q nvidia-cuda-runtime-cu12 'vllm[audio]' && \
export LD_LIBRARY_PATH=/usr/local/lib/python3.12/dist-packages/nvidia/cuda_runtime/lib:$LD_LIBRARY_PATH && \
vllm serve nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4 \
--served-model-name nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4 nemotron-3-nano-omni \
--tensor-parallel-size 1 \
--port 8000 --host 0.0.0.0 \
--max-model-len 10000 \
--max-num-seqs 128 \
--max-num-batched-tokens 32768 \
--gpu-memory-utilization 0.75 \
--quantization fp4 \
--moe-backend marlin \
--kv-cache-dtype fp8 \
--mamba-ssm-cache-dtype float32 \
--enable-prefix-caching \
--reasoning-parser nemotron_v3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--video-pruning-rate 0.5 \
--limit-mm-per-prompt '{"video":1,"image":1,"audio":1}' \
--media-io-kwargs '{"video":{"fps":2,"num_frames":256}}' \
--allowed-local-media-path / \
--trust-remote-code
209 requests served · 121,420 tokens
209 requests · 121,420 tokens (73,676 in / 47,744 out)
Run them free in the playground or from your own code via the OpenAI-compatible API.
curl https://api.inference.club/v1/chat/completions \
-H "Authorization: Bearer $INFERENCE_CLUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "dia-1.6b",
"messages": [{"role": "user", "content": "Hello!"}]
}'