vllm-ascend

Name: vllm-ascend
Author: ascend-ai-coding

✓

ascend-ai-coding/awesome-ascend-skills

vLLM Ascend plugin for LLM inference serving on Huawei Ascend NPU. Use for offline batch inference, API server deployment, quantization inference (with msmodelslim quantized models), tensor/pipeline parallelism for distributed serving, and OpenAI-compatible API endpoints. Supports Qwen, DeepSeek, GLM, LLaMA models with Ascend-optimized kernels.

ascend-ai-coding·vllm·ascend

15Installs·1Trend·@ascend-ai-coding