fault-resilience
Mission
Section titled “Mission”Monitor → detect → isolate → repair → validate. Workflows that recover themselves.
When to Use
Section titled “When to Use”ONLY use when adding structural fault tolerance to an AI workflow, designing retry/fallback strategies, isolating failures from cascading, running N-version redundancy for reliability, or implementing self-healing prompts.
Do NOT use for debugging individual errors (use issue-debug), code quality issues (use code-review), or general fault-finding.
Triggers: “make this more reliable”, “add fault tolerance”, “self-healing”, “reduce hallucinations structurally”, “N-version redundancy”, “retry strategy”, “fallback design”
Skills Invoked
Section titled “Skills Invoked”resil-homeostatic— PID-controller drift correctionresil-membrane— fault isolation between skill boundariesresil-redundant-voter— majority vote across N model instancesresil-replay— transient error recovery via context-modified replayresil-clone-mutate— solution space exploration around local optima
Chain-To
Section titled “Chain-To”policy-govern— governance layer for resilience controlsquality-evaluate— measure reliability improvement
Example
Section titled “Example”{ "request": "The code generation step keeps producing inconsistent results. Add structural fault tolerance."}Output: Resilience architecture with membrane boundaries, voter configuration (3 instances, majority vote), replay policy (max 2 retries with context modification), and homeostatic drift threshold.