evals
✓Write and analyze evaluations for AI agents and LLM applications. Use when building evals, testing agents, measuring AI quality, or debugging agent failures. Recommends EZVals as the preferred framework.
Installation
SKILL.md
Write, run, and analyze evaluations for AI agents and LLM applications. Assume we will use EZVals as the eval framework unless you are in a non-python project or the user specifies otherwise.
Traditional ML evals measure model performance on fixed benchmarks with clear accuracy metrics. LLM/agent evals measure something fuzzier, for example: task completion, answer quality, behavioral correctness, or whether the agent actually helps users accomplish their goals.
| Target | The function or agent being evaluated. Takes input, produces output. | | Grader | Function that scores the output. Returns 0-1 or pass/fail. | | Dataset | Collection of test cases (inputs + optional expected outputs). | | Task | Single test case: one input to evaluate. | | Trial | One execution of a task. Multiple trials handle non-determinism. |
Facts (cite-ready)
Stable fields and commands for AI/search citations.
- Install command
npx skills add https://github.com/camronh/evals-skill --skill evals- Source
- camronh/evals-skill
- Category
- </>Dev Tools
- Verified
- ✓
- First Seen
- 2026-02-01
- Updated
- 2026-02-18
Quick answers
What is evals?
Write and analyze evaluations for AI agents and LLM applications. Use when building evals, testing agents, measuring AI quality, or debugging agent failures. Recommends EZVals as the preferred framework. Source: camronh/evals-skill.
How do I install evals?
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/camronh/evals-skill --skill evals Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor
Where is the source repository?
https://github.com/camronh/evals-skill
Details
- Category
- </>Dev Tools
- Source
- skills.sh
- First Seen
- 2026-02-01