Skip to content

eval-prompt

Domain: eval · Model class: cheap

Use this skill when the user wants to work on Scoring and grading prompts against benchmark datasets and golden test sets. Triggers include “evaluate this prompt”, “score my prompt against test cases”, “benchmark my prompt”. Do NOT use when refine the prompt after evaluation (use core-prompt-refinement).

Scoring and grading prompts against benchmark datasets and golden test sets. This skill provides structured guidance, references, and worked examples to help produce high-quality, actionable outputs.

  • “evaluate this prompt”
  • “score my prompt against test cases”
  • “benchmark my prompt”
  • “how good is this prompt”
  • “run an eval on my prompt”
  • refine the prompt after evaluation (use core-prompt-refinement)
  • design the eval dataset (use core-eval-design)
  1. What is the user’s goal and current state?
  2. What constraints (time, team, compliance) apply?
  3. Are there existing artifacts (specs, code, benchmarks) to reference?
  • evaluation criteria
  • scoring or benchmark framing
  • comparison-ready output
  • decision guidance

prompt-refinement · eval-design · eval-output-grading