Model
Fire a Request
Load Test
What am I looking at?
This tool shows what the GPU infrastructure does when an LLM request comes in — not just "the AI answered."
Time to First Token (TTFT) — how long before the first word arrived. Changes dramatically with prompt length.
Load test — fire multiple requests concurrently. Watch p95/p99 latency climb as the request queue fills. That's the GPU scheduler under pressure.
Backed by NVIDIA NIM on DGX Cloud. For full GPU metrics (VRAM, KV cache, temperature) see the self-hosted container path.
Last Request — Infrastructure Metrics
Time to First Token
—ms
Total Latency
—ms
Throughput
—tok/s
Tokens Generated
—
Response
Fire a request to see the response here.
The metrics above update in real time as tokens arrive.