·validate-evaluator
{}

validate-evaluator

Calibrate an LLM judge against human labels using data splits, TPR/TNR, and bias correction. Use after writing a judge prompt (write-judge-prompt) when you need to verify alignment before trusting its outputs. Do NOT use for code-based evaluators (those are deterministic; test with standard unit tests).

74Installs·3Trend·@hamelsmu

Installation

$npx skills add https://github.com/hamelsmu/evals-skills --skill validate-evaluator

How to Install validate-evaluator

Quickly install validate-evaluator AI skill to your development environment via command line

  1. Open Terminal: Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.)
  2. Run Installation Command: Copy and run this command: npx skills add https://github.com/hamelsmu/evals-skills --skill validate-evaluator
  3. Verify Installation: Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Source: hamelsmu/evals-skills.

SKILL.md

View raw

| Training | 10-20% (10-20 examples) | Source of few-shot examples for the judge prompt | Only clear-cut Pass and Fail cases. Used directly in the prompt. | | Dev | 40-45% (40-45 examples) | Iterative evaluator refinement | Never include in the prompt. Evaluate against repeatedly. |

| Test | 40-45% (40-45 examples) | Final unbiased accuracy measurement | Do NOT look at during development. Used once at the end. |

Target: 30-50 examples of each class (Pass and Fail) across dev and test combined. Use balanced splits even if real-world prevalence is skewed — you need enough Fail examples to measure TNR reliably.

Calibrate an LLM judge against human labels using data splits, TPR/TNR, and bias correction. Use after writing a judge prompt (write-judge-prompt) when you need to verify alignment before trusting its outputs. Do NOT use for code-based evaluators (those are deterministic; test with standard unit tests). Source: hamelsmu/evals-skills.

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/hamelsmu/evals-skills --skill validate-evaluator
Category
{}Data Analysis
Verified
First Seen
2026-03-04
Updated
2026-03-10

Browse more skills from hamelsmu/evals-skills

Quick answers

What is validate-evaluator?

Calibrate an LLM judge against human labels using data splits, TPR/TNR, and bias correction. Use after writing a judge prompt (write-judge-prompt) when you need to verify alignment before trusting its outputs. Do NOT use for code-based evaluators (those are deterministic; test with standard unit tests). Source: hamelsmu/evals-skills.

How do I install validate-evaluator?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/hamelsmu/evals-skills --skill validate-evaluator Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Where is the source repository?

https://github.com/hamelsmu/evals-skills