loading… NVIDIA NIM GitHub ↗
Model
Fire a Request
Load Test

What am I looking at?

This tool shows what the GPU infrastructure does when an LLM request comes in — not just "the AI answered."

Time to First Token (TTFT) — how long before the first word arrived. Changes dramatically with prompt length.

Load test — fire multiple requests concurrently. Watch p95/p99 latency climb as the request queue fills. That's the GPU scheduler under pressure.

Backed by NVIDIA NIM on DGX Cloud. For full GPU metrics (VRAM, KV cache, temperature) see the self-hosted container path.

Last Request — Infrastructure Metrics
Time to First Token
ms
Total Latency
ms
Throughput
tok/s
Tokens Generated
Response
Fire a request to see the response here. The metrics above update in real time as tokens arrive.