What is vllm-deployment?
Deploy vLLM for high-performance LLM inference. Covers Docker CPU/GPU deployments and cloud VM provisioning with OpenAI-compatible API endpoints. Source: stakpak/community-paks.
Deploy vLLM for high-performance LLM inference. Covers Docker CPU/GPU deployments and cloud VM provisioning with OpenAI-compatible API endpoints.
Quickly install vllm-deployment AI skill to your development environment via command line
Source: stakpak/community-paks.
| CPU | 2x model size | 4x model size | | GPU | Model size + 2GB | Model size + 4GB VRAM |
| VLLMCPUKVCACHESPACE | KV cache size in GB (CPU) | 4 | | VLLMCPUOMPTHREADSBIND | CPU core binding (CPU) | 0-7 | | CUDAVISIBLEDEVICES | GPU device selection | 0,1 | | HFTOKEN | HuggingFace authentication | hfxxx |
| --shm-size=4g | Shared memory for IPC | | --cap-add SYSNICE | NUMA optimization (CPU) | | --security-opt seccomp=unconfined | Memory policy syscalls (CPU) | | --gpus all | GPU access | | -p 8000:8000 | Port mapping |
Deploy vLLM for high-performance LLM inference. Covers Docker CPU/GPU deployments and cloud VM provisioning with OpenAI-compatible API endpoints. Source: stakpak/community-paks.
Stable fields and commands for AI/search citations.
npx skills add https://github.com/stakpak/community-paks --skill vllm-deploymentDeploy vLLM for high-performance LLM inference. Covers Docker CPU/GPU deployments and cloud VM provisioning with OpenAI-compatible API endpoints. Source: stakpak/community-paks.
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/stakpak/community-paks --skill vllm-deployment Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw
https://github.com/stakpak/community-paks