
Omni-Infer v0.5.0 已经发布,超大规模 MoE 模型推理加速技术。
v0.5.0
核心特性
- 支持VeRL
其它优化
- 基于5P8-1D32@A3,2K+2K,Deepseek R1性能达到QPM500,TTFT
- 基于1P16-1D32@A2,2K+2K,Deepseek R1单卡Decode峰值性能达到400 TPS,TPOT
支持模型列表
| 模型 | 硬件 | 精度类型 | 部署形态 |
|---|---|---|---|
| DeepSeek-R1 | A3 | INT8 | PD分离 |
| DeepSeek-R1 | A3 | W4A8C16 | PD分离 |
| DeepSeek-R1 | A3 | BF16 | PD分离 |
| DeepSeek-R1 | A2 | INT8 | PD分离 |
| Qwen2.5-7B | A3 | INT8 | 混布(TP>=1 DP=1) |
| Qwen2.5-7B | A2 | INT8 | 混布(TP>=1 DP=1) |
| QwQ | A3 | BF16 | PD分离 |
| Qwen3-32B | A3 | BF16 | PD分离 |
| Qwen3-235B | A3 | INT8 | PD分离 |
| Kimi-K2 | A3 | W4A8C16 | PD分离 |
安装包
| 硬件 | 架构 | 镜像文件 | Tar包 |
|---|---|---|---|
| A3 | arm | docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a3-arm:release_v0.5.0 | omni_infer-a3-arm:v0.5.0 |
| A3 | x86 | docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a3-x86:release_v0.5.0 | omni_infer-a3-x86:v0.5.0 |
| A2 | arm | docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a2-arm:release_v0.5.0 | omni_infer-a2-arm:v0.5.0 |
| A2 | x86 | docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a2-x86:release_v0.5.0 | omni_infer-a2-x86:v0.5.0 |
详情查看:https://gitee.com/omniai/omniinfer/releases/v0.5.0
源码地址:点击下载










