·awq-quantization
</>

awq-quantization

orchestra-research/ai-research-skills

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

15Installs·1Trend·@orchestra-research

Installation

$npx skills add https://github.com/orchestra-research/ai-research-skills --skill awq-quantization

SKILL.md

4-bit quantization that preserves salient weights based on activation patterns, achieving 3x speedup with minimal accuracy loss.

Timing: 10-15 min for 7B, 1 hour for 70B models.

| Speedup (4-bit) | 2.5-3x | 2x | 1.5x | | Accuracy loss | <5% | 5-10% | 5-15% | | Calibration | Minimal (128-1K tokens) | More extensive | None | | Overfitting risk | Low | Higher | N/A | | Best for | Production inference | GPU inference | Easy integration | | vLLM support | Native | Yes | Limited |

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner. Source: orchestra-research/ai-research-skills.

View raw

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/orchestra-research/ai-research-skills --skill awq-quantization
Category
</>Dev Tools
Verified
First Seen
2026-02-11
Updated
2026-02-18

Quick answers

What is awq-quantization?

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner. Source: orchestra-research/ai-research-skills.

How do I install awq-quantization?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/orchestra-research/ai-research-skills --skill awq-quantization Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor

Where is the source repository?

https://github.com/orchestra-research/ai-research-skills