Skip to content

fault-resilience

Advancedworkflow

Monitor → detect → isolate → repair → validate. Workflows that recover themselves.

ONLY use when adding structural fault tolerance to an AI workflow, designing retry/fallback strategies, isolating failures from cascading, running N-version redundancy for reliability, or implementing self-healing prompts.

Do NOT use for debugging individual errors (use issue-debug), code quality issues (use code-review), or general fault-finding.

Triggers: “make this more reliable”, “add fault tolerance”, “self-healing”, “reduce hallucinations structurally”, “N-version redundancy”, “retry strategy”, “fallback design”

  • resil-homeostatic — PID-controller drift correction
  • resil-membrane — fault isolation between skill boundaries
  • resil-redundant-voter — majority vote across N model instances
  • resil-replay — transient error recovery via context-modified replay
  • resil-clone-mutate — solution space exploration around local optima
  • policy-govern — governance layer for resilience controls
  • quality-evaluate — measure reliability improvement
{
"request": "The code generation step keeps producing inconsistent results. Add structural fault tolerance."
}

Output: Resilience architecture with membrane boundaries, voter configuration (3 instances, majority vote), replay policy (max 2 retries with context modification), and homeostatic drift threshold.