deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 压测 数据
更新时间:2025年3月21日 10:05
浏览:141
H800 单实例稳定并发:15 请求, 510 tokens/s
k6 压测数据
256并发, 300秒超时
execution: local
script: run.js
output: -
scenarios: (100.00%) 1 scenario, 256 max VUs, 5m30s max duration (incl. graceful stop):
* default: 256 looping VUs for 5m0s (gracefulStop: 30s)
✗ is status 200
↳ 4% — ✓ 114 / ✗ 2220
✓ verify msg
checks.........................: 52.44% 2448 out of 4668
data_received..................: 3.0 MB 9.2 kB/s
data_sent......................: 1.1 MB 3.3 kB/s
http_req_blocked...............: avg=109.96ms min=0s med=0s max=2.18s p(90)=498.35ms p(95)=1.08s
http_req_connecting............: avg=14.07ms min=0s med=0s max=1.05s p(90)=34.3ms p(95)=69.46ms
http_req_duration..............: avg=33.28s min=30.02s med=30.03s max=2m34s p(90)=30.53s p(95)=30.69s
{ expected_response:true }...: avg=1m35s min=44.4s med=1m33s max=2m34s p(90)=2m7s p(95)=2m10s
http_req_failed................: 95.11% 2220 out of 2334
http_req_receiving.............: avg=4.03ms min=0s med=894.45µs max=475.27ms p(90)=10.31ms p(95)=18.13ms
http_req_sending...............: avg=72.77µs min=0s med=0s max=6.32ms p(90)=126.77µs p(95)=549.03µs
http_req_tls_handshaking.......: avg=93.32ms min=0s med=0s max=2.08s p(90)=250.8ms p(95)=949.94ms
http_req_waiting...............: avg=33.28s min=30.02s med=30.03s max=2m34s p(90)=30.52s p(95)=30.68s
http_reqs......................: 2334 7.072676/s
iteration_duration.............: avg=33.39s min=30.02s med=30.03s max=2m34s p(90)=31.24s p(95)=31.87s
iterations.....................: 2334 7.072676/s
vus............................: 5 min=5 max=256
vus_max........................: 256 min=256 max=256
running (5m30.0s), 000/256 VUs, 2334 complete and 5 interrupted iterations
default ✓ [======================================] 256 VUs 5m0s
并发性能
256 并发压测,只有 15 个请求进入了 gpustack
稳定输出:15 并发, 510 tokens/s
相比 vllm 单独部署时差距较大,估计 gpustack 框架本身限制了并发性能
INFO 03-19 02:54:02 metrics.py:455] Avg prompt throughput: 9.2 tokens/s, Avg generation throughput: 503.0 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 32.7%, CPU KV cache usage: 0.0%.
INFO 03-19 02:54:07 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 521.7 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 40.2%, CPU KV cache usage: 0.0%.
INFO 03-19 02:54:12 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 517.8 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 47.7%, CPU KV cache usage: 0.0%.