llm-evaluation
✓LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing
Installation
SKILL.md
Test prompts, models, and RAG systems with automated evaluation and CI/CD integration.
LLM outputs are non-deterministic. "It looks good" isn't testing. You need:
| Functional | Does it work? | contains, equals, is-json | | Semantic | Is it correct? | similar, llm-rubric, factuality | | Performance | Is it fast/cheap? | cost, latency | | Security | Is it safe? | redteam, moderation, pii-detection |
LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing Source: phrazzld/claude-config.
Facts (cite-ready)
Stable fields and commands for AI/search citations.
- Install command
npx skills add https://github.com/phrazzld/claude-config --skill llm-evaluation- Source
- phrazzld/claude-config
- Category
- </>Dev Tools
- Verified
- ✓
- First Seen
- 2026-02-01
- Updated
- 2026-02-18
Quick answers
What is llm-evaluation?
LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing Source: phrazzld/claude-config.
How do I install llm-evaluation?
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/phrazzld/claude-config --skill llm-evaluation Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor
Where is the source repository?
https://github.com/phrazzld/claude-config
Details
- Category
- </>Dev Tools
- Source
- skills.sh
- First Seen
- 2026-02-01