Evaluate agent task outputs against a three-dimension rubric and produce structured verdict records. The judge operates as a quality gate at the task completion boundary, scoring outputs on Semantic accuracy, Pragmatic usefulness, and Syntactic consistency.
The rubric reuses three dimensions from the KLS (Krogstie-Lindland-Sindre) quality framework defined in disciplined-quality-evaluation:
| Semantic | Does it accurately represent the domain? | Factual correctness, domain terminology, no contradictions | | Pragmatic | Does it enable the intended decisions/actions? | Actionable, useful, addresses the task goal | | Syntactic | Is it internally consistent and well-structured? | Format compliance, structural completeness, no broken references |
Evaluate agent task outputs using a three-dimension rubric (Semantic, Pragmatic, Syntactic) derived from the KLS quality framework. Use when: (1) a task has been completed and needs quality assessment before acceptance, (2) automated post-task quality checks are required, (3) multi-model consensus verdicts are needed for agent outputs, (4) documentation, code, or specification quality must be scored with structured JSON verdicts, or (5) a human fallback decision is needed after model disagreement. Produces JSONL verdict records compatible with the verdict schema in automation/judge/. Source: terraphim/terraphim-skills.