deepseek-ai/DeepSeek-R1-Distill-Llama-70B 压测
更新时间:2025年3月19日 11:15
浏览:115
H800 单实例稳定并发:15 请求, 450 tokens/s
k6 压测数据
256并发, 300秒超时
checks.........................: 52.48% 2430 out of 4630
data_received..................: 2.9 MB 8.8 kB/s
data_sent......................: 1.1 MB 3.3 kB/s
http_req_blocked...............: avg=111.44ms min=0s med=0s max=2.04s p(90)=488.56ms p(95)=1.07s
http_req_connecting............: avg=15.36ms min=0s med=0s max=1.07s p(90)=35.04ms p(95)=71.12ms
http_req_duration..............: avg=33.6s min=30.02s med=30.03s max=2m32s p(90)=30.32s p(95)=30.41s
{ expected_response:true }...: avg=1m40s min=49.64s med=1m42s max=2m32s p(90)=2m12s p(95)=2m13s
http_req_failed................: 95.03% 2200 out of 2315
http_req_receiving.............: avg=5.17ms min=0s med=913.6µs max=184.22ms p(90)=17.94ms p(95)=30.88ms
http_req_sending...............: avg=82.44µs min=0s med=0s max=1.39ms p(90)=171.73µs p(95)=557.45µs
http_req_tls_handshaking.......: avg=95.11ms min=0s med=0s max=1.99s p(90)=384.59ms p(95)=962.78ms
http_req_waiting...............: avg=33.6s min=30.02s med=30.03s max=2m32s p(90)=30.3s p(95)=30.41s
http_reqs......................: 2315 7.015113/s
iteration_duration.............: avg=33.71s min=30.02s med=30.03s max=2m32s p(90)=31.41s p(95)=32.22s
iterations.....................: 2315 7.015113/s
vus............................: 9 min=9 max=256
vus_max........................: 256 min=256 max=256
running (5m30.0s), 000/256 VUs, 2315 complete and 9 interrupted iterations
default ✓ [======================================] 256 VUs 5m0s
并发性能
256 并发压测,只有 15 个请求进入了 gpustack
稳定输出:15 并发, 450 tokens/s
相比 vllm 单独部署时差距较大,估计 gpustack 框架本身限制了并发性能
INFO 03-19 01:33:02 metrics.py:455] Avg prompt throughput: 9.2 tokens/s, Avg generation throughput: 446.8 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 57.9%, CPU KV cache usage: 0.0%.
INFO 03-19 01:33:07 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 461.5 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 69.9%, CPU KV cache usage: 0.0%.