high-performance-inference
✓High-performance LLM inference with vLLM, quantization (AWQ, GPTQ, FP8), speculative decoding, and edge deployment. Use when optimizing inference latency, throughput, or memory.
Installation
SKILL.md
Optimize LLM inference for production with vLLM 0.14.x, quantization, and speculative decoding.
vLLM 0.14.0 (Jan 2026): PyTorch 2.9.0, CUDA 12.9, AttentionConfig API, Python 3.12+ recommended.
| PagedAttention | Up to 24x throughput via efficient KV cache | | Continuous Batching | Dynamic request batching for max utilization | | CUDA Graphs | Fast model execution with graph capture | | Tensor Parallelism | Scale across multiple GPUs | | Prefix Caching | Reuse KV cache for shared prefixes |
High-performance LLM inference with vLLM, quantization (AWQ, GPTQ, FP8), speculative decoding, and edge deployment. Use when optimizing inference latency, throughput, or memory. Source: yonatangross/orchestkit.
Facts (cite-ready)
Stable fields and commands for AI/search citations.
- Install command
npx skills add https://github.com/yonatangross/orchestkit --skill high-performance-inference- Source
- yonatangross/orchestkit
- Category
- </>Dev Tools
- Verified
- ✓
- First Seen
- 2026-02-01
- Updated
- 2026-02-18
Quick answers
What is high-performance-inference?
High-performance LLM inference with vLLM, quantization (AWQ, GPTQ, FP8), speculative decoding, and edge deployment. Use when optimizing inference latency, throughput, or memory. Source: yonatangross/orchestkit.
How do I install high-performance-inference?
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/yonatangross/orchestkit --skill high-performance-inference Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor
Where is the source repository?
https://github.com/yonatangross/orchestkit
Details
- Category
- </>Dev Tools
- Source
- skills.sh
- First Seen
- 2026-02-01