bench-analyzer
Domain: bench · Model class: cheap
Description
Section titled “Description”Use this skill when the user wants to work on Analyzing benchmark results to identify quality trends, regressions, and performance signals. Triggers include “analyze benchmark results”, “interpret my eval results”, “quality trends from benchmarks”. Do NOT use when design the benchmark (use core-eval-design).
Purpose
Section titled “Purpose”Analyzing benchmark results to identify quality trends, regressions, and performance signals. This skill provides structured guidance, references, and worked examples to help produce high-quality, actionable outputs.
Trigger Phrases
Section titled “Trigger Phrases”- “analyze benchmark results”
- “interpret my eval results”
- “quality trends from benchmarks”
- “regression analysis from evals”
Anti-Triggers
Section titled “Anti-Triggers”- design the benchmark (use core-eval-design)
- grade individual outputs (use core-output-grading)
Intake Questions
Section titled “Intake Questions”- What is the user’s goal and current state?
- What constraints (time, team, compliance) apply?
- Are there existing artifacts (specs, code, benchmarks) to reference?
Output Contract
Section titled “Output Contract”- benchmark analysis summary
- trend or regression findings
- comparison-ready evidence
- follow-up actions