llm-evaluation
✓LLM output evaluation and quality assessment. Use when implementing LLM-as-judge patterns, quality gates for AI outputs, or automated evaluation pipelines.
Installation
SKILL.md
Evaluate and validate LLM outputs for quality assurance using RAGAS and LLM-as-judge patterns.
| Faithfulness | RAG grounding | ≥ 0.8 | | Answer Relevancy | Q&A systems | ≥ 0.7 | | Context Precision | Retrieval quality | ≥ 0.7 | | Context Recall | Retrieval completeness | ≥ 0.7 |
| Judge model | GPT-4o-mini or Claude Haiku | | Threshold | 0.7 for production, 0.6 for drafts | | Dimensions | 3-5 most relevant to use case | | Sample size | 50+ for reliable metrics |
LLM output evaluation and quality assessment. Use when implementing LLM-as-judge patterns, quality gates for AI outputs, or automated evaluation pipelines. Source: yonatangross/orchestkit.
Facts (cite-ready)
Stable fields and commands for AI/search citations.
- Install command
npx skills add https://github.com/yonatangross/orchestkit --skill llm-evaluation- Source
- yonatangross/orchestkit
- Category
- </>Dev Tools
- Verified
- ✓
- First Seen
- 2026-02-01
- Updated
- 2026-02-18
Quick answers
What is llm-evaluation?
LLM output evaluation and quality assessment. Use when implementing LLM-as-judge patterns, quality gates for AI outputs, or automated evaluation pipelines. Source: yonatangross/orchestkit.
How do I install llm-evaluation?
Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/yonatangross/orchestkit --skill llm-evaluation Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code or Cursor
Where is the source repository?
https://github.com/yonatangross/orchestkit
Details
- Category
- </>Dev Tools
- Source
- skills.sh
- First Seen
- 2026-02-01