deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 压测 数据

更新时间:2025年3月21日 10:05 浏览:141

H800 单实例稳定并发:15 请求, 510 tokens/s

k6 压测数据

256并发, 300秒超时


     execution: local
        script: run.js
        output: -

     scenarios: (100.00%) 1 scenario, 256 max VUs, 5m30s max duration (incl. graceful stop):
              * default: 256 looping VUs for 5m0s (gracefulStop: 30s)


     ✗ is status 200
      ↳  4% — ✓ 114 / ✗ 2220
     ✓ verify msg

     checks.........................: 52.44% 2448 out of 4668
     data_received..................: 3.0 MB 9.2 kB/s
     data_sent......................: 1.1 MB 3.3 kB/s
     http_req_blocked...............: avg=109.96ms min=0s     med=0s       max=2.18s    p(90)=498.35ms p(95)=1.08s
     http_req_connecting............: avg=14.07ms  min=0s     med=0s       max=1.05s    p(90)=34.3ms   p(95)=69.46ms
     http_req_duration..............: avg=33.28s   min=30.02s med=30.03s   max=2m34s    p(90)=30.53s   p(95)=30.69s
       { expected_response:true }...: avg=1m35s    min=44.4s  med=1m33s    max=2m34s    p(90)=2m7s     p(95)=2m10s
     http_req_failed................: 95.11% 2220 out of 2334
     http_req_receiving.............: avg=4.03ms   min=0s     med=894.45µs max=475.27ms p(90)=10.31ms  p(95)=18.13ms
     http_req_sending...............: avg=72.77µs  min=0s     med=0s       max=6.32ms   p(90)=126.77µs p(95)=549.03µs
     http_req_tls_handshaking.......: avg=93.32ms  min=0s     med=0s       max=2.08s    p(90)=250.8ms  p(95)=949.94ms
     http_req_waiting...............: avg=33.28s   min=30.02s med=30.03s   max=2m34s    p(90)=30.52s   p(95)=30.68s
     http_reqs......................: 2334   7.072676/s
     iteration_duration.............: avg=33.39s   min=30.02s med=30.03s   max=2m34s    p(90)=31.24s   p(95)=31.87s
     iterations.....................: 2334   7.072676/s
     vus............................: 5      min=5            max=256
     vus_max........................: 256    min=256          max=256


running (5m30.0s), 000/256 VUs, 2334 complete and 5 interrupted iterations
default ✓ [======================================] 256 VUs  5m0s

并发性能

256 并发压测,只有 15 个请求进入了 gpustack

稳定输出:15 并发, 510 tokens/s

相比 vllm 单独部署时差距较大,估计 gpustack 框架本身限制了并发性能


INFO 03-19 02:54:02 metrics.py:455] Avg prompt throughput: 9.2 tokens/s, Avg generation throughput: 503.0 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 32.7%, CPU KV cache usage: 0.0%.
INFO 03-19 02:54:07 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 521.7 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 40.2%, CPU KV cache usage: 0.0%.
INFO 03-19 02:54:12 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 517.8 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 47.7%, CPU KV cache usage: 0.0%.
导航