eval-prompt
Domain: eval · Model class: cheap
Description
Section titled “Description”Use this skill when the user wants to work on Scoring and grading prompts against benchmark datasets and golden test sets. Triggers include “evaluate this prompt”, “score my prompt against test cases”, “benchmark my prompt”. Do NOT use when refine the prompt after evaluation (use core-prompt-refinement).
Purpose
Section titled “Purpose”Scoring and grading prompts against benchmark datasets and golden test sets. This skill provides structured guidance, references, and worked examples to help produce high-quality, actionable outputs.
Trigger Phrases
Section titled “Trigger Phrases”- “evaluate this prompt”
- “score my prompt against test cases”
- “benchmark my prompt”
- “how good is this prompt”
- “run an eval on my prompt”
Anti-Triggers
Section titled “Anti-Triggers”- refine the prompt after evaluation (use core-prompt-refinement)
- design the eval dataset (use core-eval-design)
Intake Questions
Section titled “Intake Questions”- What is the user’s goal and current state?
- What constraints (time, team, compliance) apply?
- Are there existing artifacts (specs, code, benchmarks) to reference?
Output Contract
Section titled “Output Contract”- evaluation criteria
- scoring or benchmark framing
- comparison-ready output
- decision guidance