Skip to content

arch-scalability

Domain: arch · Model class: strong

Use this skill when the user wants to work on Designing AI systems for inference-heavy scalability, latency budgets, and cost efficiency. Triggers include “how do I scale my AI system”, “inference scalability”, “latency budget design”. Do NOT use when design the initial system (use core-system-design).

Designing AI systems for inference-heavy scalability, latency budgets, and cost efficiency. This skill provides structured guidance, references, and worked examples to help produce high-quality, actionable outputs.

  • “how do I scale my AI system”
  • “inference scalability”
  • “latency budget design”
  • “cost-aware agent architecture”
  • design the initial system (use core-system-design)
  • analyze runtime performance (use core-performance-review)
  1. What is the user’s goal and current state?
  2. What constraints (time, team, compliance) apply?
  3. Are there existing artifacts (specs, code, benchmarks) to reference?
  • architecture recommendation
  • tradeoff summary
  • system component framing
  • risk and next-step guidance

arch-system · arch-reliability · strat-tradeoff