·eval-audit
{}

eval-audit

Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead).

79Installs·2Trend·@hamelsmu

Installation

$npx skills add https://github.com/hamelsmu/evals-skills --skill eval-audit

How to Install eval-audit

Quickly install eval-audit AI skill to your development environment via command line

  1. Open Terminal: Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.)
  2. Run Installation Command: Copy and run this command: npx skills add https://github.com/hamelsmu/evals-skills --skill eval-audit
  3. Verify Installation: Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Source: hamelsmu/evals-skills.

SKILL.md

View raw

Inspect an LLM eval pipeline and produce a prioritized list of problems with concrete next steps.

Access to eval artifacts (traces, evaluator configs, judge prompts, labeled data) via an observability MCP server or local files. If none exist, skip to "No Eval Infrastructure."

Check whether the user has an observability MCP server connected (Phoenix, Braintrust, LangSmith, Truesight or similar). If available, use it to pull traces, evaluator definitions, and experiment results. If not, ask for local files: CSVs, JSON trace exports, notebooks, or evaluation scripts.

Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead). Source: hamelsmu/evals-skills.

Facts (cite-ready)

Stable fields and commands for AI/search citations.

Install command
npx skills add https://github.com/hamelsmu/evals-skills --skill eval-audit
Category
{}Data Analysis
Verified
First Seen
2026-03-04
Updated
2026-03-10

Browse more skills from hamelsmu/evals-skills

Quick answers

What is eval-audit?

Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead). Source: hamelsmu/evals-skills.

How do I install eval-audit?

Open your terminal or command line tool (Terminal, iTerm, Windows Terminal, etc.) Copy and run this command: npx skills add https://github.com/hamelsmu/evals-skills --skill eval-audit Once installed, the skill will be automatically configured in your AI coding environment and ready to use in Claude Code, Cursor, or OpenClaw

Where is the source repository?

https://github.com/hamelsmu/evals-skills