·llm-as-a-judge
</>

llm-as-a-judge

Build, validate, and deploy LLM-as-Judge evaluators for automated quality assessment of LLM pipeline outputs. Use this skill whenever the user wants to: create an automated evaluator for subjective or nuanced failure modes, write a judge prompt for Pass/Fail assessment, split labeled data for judge development, measure judge alignment (TPR/TNR), estimate true success rates with bias correction, or set up CI evaluation pipelines. Also trigger when the user mentions "judge prompt", "automated eval", "LLM evaluator", "grading prompt", "alignment metrics", "true positive rate", or wants to move from manual trace review to automated evaluation. This skill covers the full lifecycle: prompt design → data splitting → iterative refinement → success rate estimation.

8Installs·0Trend·@maragudk

Installation

$npx skills add https://github.com/maragudk/evals-skills --skill llm-as-a-judge

How to Install llm-as-a-judge

Quickly install llm-as-a-judge AI skill to your development environment via command line

  1. Open Terminal: Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.)
  2. Run Installation Command: Copy and run this command: npx skills add https://github.com/maragudk/evals-skills --skill llm-as-a-judge
  3. Verify Installation: Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Source: maragudk/evals-skills.

SKILL.md

View raw

Build reliable automated evaluators that use an LLM to judge the outputs of another LLM pipeline. Each judge targets a single, binary (Pass/Fail) failure mode identified during error analysis.

Use code-based evaluators when the failure is objective and deterministic:

Each failure mode gets its own dedicated evaluator. Never combine multiple criteria into a single judge prompt—this introduces ambiguity and makes diagnosis harder.

Build, validate, and deploy LLM-as-Judge evaluators for automated quality assessment of LLM pipeline outputs. Use this skill whenever the user wants to: create an automated evaluator for subjective or nuanced failure modes, write a judge prompt for Pass/Fail assessment, split labeled data for judge development, measure judge alignment (TPR/TNR), estimate true success rates with bias correction, or set up CI evaluation pipelines. Also trigger when the user mentions "judge prompt", "automated eval", "LLM evaluator", "grading prompt", "alignment metrics", "true positive rate", or wants to move from manual trace review to automated evaluation. This skill covers the full lifecycle: prompt design → data splitting → iterative refinement → success rate estimation. Source: maragudk/evals-skills.

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/maragudk/evals-skills --skill llm-as-a-judge
Category
</>Dev Tools
Verified
First Seen
2026-02-22
Updated
2026-03-10

Browse more skills from maragudk/evals-skills

Quick answers

What is llm-as-a-judge?

Build, validate, and deploy LLM-as-Judge evaluators for automated quality assessment of LLM pipeline outputs. Use this skill whenever the user wants to: create an automated evaluator for subjective or nuanced failure modes, write a judge prompt for Pass/Fail assessment, split labeled data for judge development, measure judge alignment (TPR/TNR), estimate true success rates with bias correction, or set up CI evaluation pipelines. Also trigger when the user mentions "judge prompt", "automated eval", "LLM evaluator", "grading prompt", "alignment metrics", "true positive rate", or wants to move from manual trace review to automated evaluation. This skill covers the full lifecycle: prompt design → data splitting → iterative refinement → success rate estimation. Source: maragudk/evals-skills.

How do I install llm-as-a-judge?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/maragudk/evals-skills --skill llm-as-a-judge Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Where is the source repository?

https://github.com/maragudk/evals-skills

Details

Category
</>Dev Tools
Source
skills.sh
First Seen
2026-02-22