agent-evaluation
✓Use when evaluating agent performance, building test frameworks, measuring quality, or asking about "agent evaluation", "LLM-as-judge", "agent testing", "quality metrics", "evaluation rubrics", "agent benchmarks"
Installation
SKILL.md
Agent evaluation requires different approaches than traditional software. Agents are non-deterministic, may take different valid paths, and lack single correct answers.
Research on BrowseComp found three factors explain 95% of variance:
| Token usage | 80% | More tokens = better performance | | Tool calls | 10% | More exploration helps | | Model choice | 5% | Better models multiply efficiency |
Use when evaluating agent performance, building test frameworks, measuring quality, or asking about "agent evaluation", "LLM-as-judge", "agent testing", "quality metrics", "evaluation rubrics", "agent benchmarks" Source: eyadsibai/ltk.
Facts (cite-ready)
Stable fields and commands for AI/search citations.
- Install command
npx skills add https://github.com/eyadsibai/ltk --skill agent-evaluation- Source
- eyadsibai/ltk
- Category
- </>Dev Tools
- Verified
- ✓
- First Seen
- 2026-02-17
- Updated
- 2026-02-18
Quick answers
What is agent-evaluation?
Use when evaluating agent performance, building test frameworks, measuring quality, or asking about "agent evaluation", "LLM-as-judge", "agent testing", "quality metrics", "evaluation rubrics", "agent benchmarks" Source: eyadsibai/ltk.
How do I install agent-evaluation?
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/eyadsibai/ltk --skill agent-evaluation Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor
Where is the source repository?
https://github.com/eyadsibai/ltk
Details
- Category
- </>Dev Tools
- Source
- skills.sh
- First Seen
- 2026-02-17