eval-output-grading
Domain: eval · Model class: cheap
Description
Section titled “Description”Use this skill when the user wants to work on Grading AI outputs using rubrics, schema validation, pairwise comparison, and judge models. Triggers include “grade these outputs”, “score AI responses”, “rubric-based grading”. Do NOT use when design the grading criteria first (use core-eval-design).
Purpose
Section titled “Purpose”Grading AI outputs using rubrics, schema validation, pairwise comparison, and judge models. This skill provides structured guidance, references, and worked examples to help produce high-quality, actionable outputs.
Trigger Phrases
Section titled “Trigger Phrases”- “grade these outputs”
- “score AI responses”
- “rubric-based grading”
- “validate output schema”
- “judge model outputs”
- “pairwise comparison”
Anti-Triggers
Section titled “Anti-Triggers”- design the grading criteria first (use core-eval-design)
- measure variance across runs (use core-variance-analysis)
Intake Questions
Section titled “Intake Questions”- What is the user’s goal and current state?
- What constraints (time, team, compliance) apply?
- Are there existing artifacts (specs, code, benchmarks) to reference?
Output Contract
Section titled “Output Contract”- evaluation criteria
- scoring or benchmark framing
- comparison-ready output
- decision guidance