🧠 Knowledge Base

Resilience Engineering: Anticipating & Adapting in Complex Systems

Focus
Category
Lens
Latest blog posts
Focus
Category
Lens
Resilience Engineering reframes safety and performance as the ability of systems to anticipate change, absorb shocks, and adapt under pressure — a shift from preventing failure to sustaining success in complex environments.
Explanation
What it is

Resilience Engineering is a systems-oriented framework that shifts focus from preventing failure to enabling sustained success under stress.

Pioneered by Erik Hollnagel and colleagues, it highlights four core “resilience potentials”: the ability to anticipate, monitor, respond, and learn.

Together, these describe how complex organisations adapt to change and continue functioning even when facing disruption.

When to use it
  • When operating in high-uncertainty environments where risks cannot be fully predicted
  • When managing critical systems where failure consequences are severe (e.g. healthcare, aviation, infrastructure, digital platforms)
  • When organisational performance depends on continuous adaptation, not static compliance
Why it matters

By focusing on adaptive capacity rather than error elimination, Resilience Engineering helps organisations sustain performance, build trust, and reduce systemic risk.

It ties safety and reliability directly to outcomes like faster recovery, improved decision-making, and greater alignment between human and technical systems.

Definitions

Resilience Engineering

A framework for analysing how systems sustain performance in the face of variability and stress by anticipating, monitoring, responding, and learning.

Four Potentials

The core capabilities of resilient systems: to anticipate future events, monitor ongoing operations, respond effectively to disruptions, and learn from past experiences.

Safety-II

A related concept from Hollnagel, shifting the focus from preventing what goes wrong to understanding and supporting what goes right in everyday operations.

Notes & Caveats
  • Scope limits
    While rooted in safety sciences, applications now extend into IT operations, healthcare, and organisational governance.
  • Typical misread
    Sometimes conflated with business continuity or disaster recovery — but its focus is ongoing adaptive capacity, not one-off recovery planning.
  • Controversy
    Critics argue that without concrete metrics, Resilience Engineering risks being too conceptual, making it harder for organisations to operationalise.
Objective

Embed the four resilience potentials into organisational practice so that systems can sustain performance under stress and adapt to change.

Steps
  1. Map current system vulnerabilities
    Use hazard mapping, audits, or scenario reviews to surface weak points and single points of failure.
  2. Assess resilience potentials
    Evaluate the organisation’s ability to anticipate, monitor, respond, and learn; timebox this to a structured workshop or audit cycle.
  3. Develop interventions
    Design actions that strengthen one or more resilience potentials, and record them in an artefact such as a resilience plan or risk register.
  4. Verify through rehearsal and feedback
    Run simulations, after-action reviews, or red-team exercises to test adaptations and confirm improvements.
Tips
  • Start small: focus on a critical service or process before scaling across the organisation.
  • Combine qualitative (stories, case reviews) and quantitative (KPIs, failure rates) data to capture both hard and soft signals.

Pitfalls

Treating it as compliance only

Avoid reducing resilience to box-ticking; the aim is adaptive capacity, not static assurance.

Over-focusing on past incidents

Resilience requires foresight, not just post-mortems; ensure anticipation and learning are balanced.

Acceptance criteria
  • Documented evidence of strengthened resilience potentials (e.g. new monitoring dashboards, updated contingency playbooks).
  • Artefacts updated with recorded risks, responses, and lessons learned.
  • Stakeholder alignment confirmed through successful drills, simulations, or peer reviews.
Scenario

A hospital emergency department faces an unexpected surge in patients after a regional accident.

The team must maintain safety and throughput while resources are stretched and conditions are rapidly changing.

Walkthrough

Decision Point

Leadership must decide whether to divert patients to other hospitals or reconfigure internal workflows to cope with demand.

Input/Output

Input:
Real-time patient flow data, staff availability, treatment capacity, and ambulance arrival forecasts.

Output:
A decision to either activate diversion protocols or adapt on-site processes (e.g. triage redesign, reallocating staff).

Action

The hospital resilience team runs a rapid workshop using the four potentials:

  • Anticipate
    Estimate continued inflow from ambulance control data.
  • Monitor
    Track vital resource indicators (beds, ventilators, staff fatigue).
  • Respond
    Temporarily convert non-critical wards into triage spaces.
  • Learn
    Capture immediate lessons during debrief to refine future surge protocols.
  • Artefact captured
    An updated Surge Response Playbook in the hospital’s governance system.

Error handling

If resource strain exceeds safe limits despite adaptations, escalation triggers automatic patient diversion to partner facilities, with real-time communication back to emergency services.

Closure

After the surge subsides, the team conducts an after-action review, documenting what worked, where bottlenecks arose, and updating the resilience artefact.

Next action
Schedule a simulation drill to rehearse the updated playbook.

Result
  • Before → After: Faster response under pressure, reduced patient risk, improved trust between staff and management.
  • Artefact snapshot: “Surge Response Playbook v2.0” — stored in the hospital’s emergency preparedness library.
Variations
  • If applied to IT operations, substitute patient inflow with service demand spikes (e.g. during cyberattack or major outage).
  • If team size is smaller, use lightweight checklists and rapid stand-ups instead of full workshops.