·llm-evaluation
</>

llm-evaluation

phrazzld/claude-config

LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing

9Installs·1Trend·@phrazzld

Installation

$npx skills add https://github.com/phrazzld/claude-config --skill llm-evaluation

SKILL.md

Test prompts, models, and RAG systems with automated evaluation and CI/CD integration.

LLM outputs are non-deterministic. "It looks good" isn't testing. You need:

| Functional | Does it work? | contains, equals, is-json | | Semantic | Is it correct? | similar, llm-rubric, factuality | | Performance | Is it fast/cheap? | cost, latency | | Security | Is it safe? | redteam, moderation, pii-detection |

LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing Source: phrazzld/claude-config.

View raw

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/phrazzld/claude-config --skill llm-evaluation
Category
</>Dev Tools
Verified
First Seen
2026-02-01
Updated
2026-02-18

Quick answers

What is llm-evaluation?

LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing Source: phrazzld/claude-config.

How do I install llm-evaluation?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/phrazzld/claude-config --skill llm-evaluation Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor

Where is the source repository?

https://github.com/phrazzld/claude-config