Dr Rajiv Chandegra
07.03.2026·Systems, Healthcare

Part of series: Incerto·Part 1 of 6

Navigate:
A lone tree shaped by persistent wind on an exposed hillside

Antifragile by Design

It was three in the morning on a Tuesday night shift in A&E when the electronic patient record system crashed. Not a slow degradation, not a partial outage. A complete blackout. Every screen in the department went dark at once.

For about thirty seconds, nobody moved. Junior doctors stared at their monitors. The receptionist refreshed her screen twice, then picked up the phone. A registrar looked at me and said, "Well, that's that then."

Then something remarkable happened. The senior nurse, Sarah, walked to the whiteboard at the nurses' station, uncapped a marker, and started writing. Patient names. Bed numbers. Presenting complaints. Observations due. She had done this before, years ago, when whiteboards were the system. Within five minutes, we had a functioning department again. Paper notes appeared from somewhere. Verbal handoffs replaced electronic task lists. Doctors walked to radiology instead of clicking referral buttons. The porters, freed from waiting for electronic bed management, simply asked the nurses where patients needed to go.

The pace didn't slow. It quickened. Decisions were faster because people were talking to each other instead of clicking through dropdown menus and mandatory fields. A chest pain patient who would normally wait twenty minutes for an electronic referral to be "accepted" by the medical team was seen in five, because the medical registrar was standing right there when I described the ECG. Sarah said something I haven't forgotten: "This is how it used to be."

By six in the morning, the system was back online. Management arrived at eight, and their sole concern was restoring full electronic documentation. An incident report was filed about the IT failure. Nobody asked the obvious question: why did the department run better without the system that was supposed to make it run better?

I didn't have the language for it then. I do now. What I witnessed that night was antifragility in action. The department didn't merely survive the disruption. It improved because of it.


The triad

Nassim Nicholas Taleb introduced a concept that fills a gap in how we think about systems. We had words for things that break under stress (fragile) and things that resist stress (robust). But we had no word for things that gain from stress. He coined one: antifragile.[1]

"Some things benefit from shocks; they thrive and grow when exposed to volatility, randomness, disorder, and stressors."

The triad looks simple, but its implications are profound.

In healthcare, examples of each are everywhere.

Fragile: A centralised IT system with no manual fallback. When it fails, the department stops. That night in A&E, the system was fragile. The department was not.

Robust: Backup generators in a hospital. When the power fails, the generator kicks in. The hospital continues as before. Nothing is gained, but nothing is lost.

Antifragile: A junior doctor who works through a brutal winter of on-calls. She emerges not just intact but transformed. She has seen things that no simulation could teach. Her clinical judgement is sharper, her confidence earned through exposure to stress. She is better because of the difficulty, not despite it.

The critical insight is that robustness is not the opposite of fragility. Antifragility is. And most of what we call "building resilience" is actually just building robustness: the ability to absorb a shock and return to baseline. That is necessary but insufficient. The question Taleb forces us to ask is: can your system actually benefit from the shock?


Why efficiency is fragile

Modern institutions worship efficiency. Lean staffing. Just-in-time supply chains. Elimination of "waste." Maximum utilisation of every bed, every clinician, every minute.

This works beautifully under normal conditions. And it shatters under stress.

The NHS operates at bed occupancy rates routinely above 90%, sometimes above 95%.[2] Hospital management literature suggests that anything above 85% creates dangerous bottlenecks.[3] But empty beds look like waste. And waste, in the language of efficiency, is the enemy.

COVID exposed this with brutal clarity. Hospitals that had optimised every square metre for normal operations had nowhere to put surge patients. Supply chains that had eliminated inventory buffers ran out of PPE within weeks. Staff rotas with zero redundancy collapsed when clinicians fell ill. The systems that looked most impressive on a spreadsheet were the first to break.

"The fragile wants tranquility, the antifragile grows from disorder."

This is not a failure of execution. It is a failure of philosophy. When you optimise solely for efficiency, you are implicitly betting that conditions will remain normal. You are betting against volatility. And in complex systems like healthcare, that is a bet you will eventually lose.

I have seen this pattern in building healthcare technology as well. The temptation is always to build for the expected case, to strip out the edge cases, to make the system "lean." But the expected case is a fiction. Reality is messy, variable, and occasionally catastrophic. The system that handles only the expected case is, by definition, fragile.


Hormesis: the biology of antifragility

Medicine already understands antifragility at the biological level. We just refuse to apply it to our institutions.

Hormesis is the dose-response relationship where small stresses make systems stronger. Exercise tears muscle fibres; they rebuild stronger. Bones subjected to impact increase their density. The immune system, exposed to small doses of a pathogen, develops the capacity to fight the real thing. This is the principle behind vaccination, one of medicine's greatest achievements.

The hormesis curve looks like this: at low doses, stress produces a positive adaptive response. At moderate doses, the benefits plateau. At high doses, the system breaks. The relationship is not linear. It is convex on the left (gains from stress) and concave on the right (damage from excess).

Dose of stressResponseExample
NoneAtrophy, weaknessBed rest causing muscle wasting
Small, repeatedAdaptation, growthRegular exercise building fitness
ModeratePlateau, maintenanceEstablished training routine
Excessive, suddenBreakdown, damageOvertraining syndrome, burnout

Every doctor learns this in physiology. We prescribe exercise knowing it creates micro-damage that triggers adaptation. We vaccinate knowing that controlled exposure builds immunity. We understand that bones need loading, that hearts need exertion, that immune systems need challenge.

Yet when it comes to designing the institutions in which we practise, we forget everything we know. We try to eliminate all stress from organisational life. We create protocols to prevent any deviation. We build systems that assume smooth, predictable operations. We treat institutional stress as pathology rather than signal.

What if we treated our organisations the way we treat bodies? What if small failures were understood not as problems to eliminate but as adaptive signals that build institutional fitness?


The price of antifragility

Antifragility is not free. It has a price, and that price is redundancy.

Spare ICU beds are "inefficient" until a pandemic arrives. Extra nurses on a ward look like overstaffing until three patients deteriorate simultaneously. A backup paper system for when the computers fail seems archaic until the computers fail.

The argument is not against efficiency. Efficiency matters. The argument is against efficiency as the sole optimisation target. When you optimise only for efficiency, you eliminate the slack that allows a system to absorb shocks. You remove the very features that would make it antifragile.

Taleb frames this as the difference between optionality and optimisation. Optimisation narrows. Optionality widens. An optimised system does one thing superbly. A system with optionality can do many things adequately and adapt to whatever comes.

PrincipleFragile designAntifragile design
CapacityMaximum utilisationBuilt-in slack
Decision-makingCentralisedDistributed
Supply chainsJust-in-time, single sourceBuffered, multiple suppliers
StaffingLean, specialisedCross-trained, redundant
ProtocolsRigid, comprehensivePrinciples-based, adaptive
Response to failureBlame, prevent recurrenceLearn, adapt, strengthen
Information flowHierarchicalNetworked
Planning horizonOptimise for expected casePrepare for unexpected case

"Wind extinguishes a candle and energises fire. Likewise with randomness, uncertainty, chaos: you want to use them, not hide from them."


Designing for antifragility

If antifragility is the goal, what does it look like in practice? Here are five principles drawn from Taleb's work, applied to healthcare and system design.

Build in redundancy

This is the most counterintuitive principle for anyone trained in efficiency thinking. Redundancy is not waste. It is insurance that pays off in ways you cannot predict. Two independent communication systems. Cross-trained staff who can cover multiple roles. Physical backup processes for digital systems. The cost of redundancy is visible and constant. The cost of not having it is invisible until it is catastrophic.

Decentralise decision-making

Centralised systems are fragile because they create single points of failure. When the decision-maker is unavailable, overloaded, or wrong, the entire system stalls. Decentralised systems distribute both the risk and the intelligence. That night in A&E, decisions were faster because they were made at the point of care, not routed through an electronic system to a coordinator somewhere else.

Create optionality

Design systems with multiple pathways, not single optimal routes. When one pathway fails, others are available. This means accepting that some pathways will be "inefficient" under normal conditions. That is the price. The payoff is that the system survives conditions that would destroy a single-pathway design.

Welcome small failures

This is where hormesis meets institutional design. Small failures are information. They reveal weaknesses before those weaknesses become catastrophic. Organisations that punish small failures drive them underground, where they accumulate until they produce a large failure. Organisations that welcome small failures, that conduct genuine learning reviews rather than blame exercises, build the institutional equivalent of an immune response.

Learn from stress

Every disruption is data. The question after a stressful event should not be "how do we prevent this from happening again?" but "what did this teach us about our system?" Sometimes the answer is that a process needs fixing. Sometimes the answer is that the disruption revealed a strength nobody knew existed. Sometimes, as in my A&E that night, the answer is that the system works better under certain kinds of stress, and we should ask why.


The question we refuse to ask

That night in A&E stays with me not because the IT system failed. IT systems fail regularly. It stays with me because of what happened next: the department got better. People communicated more. Decisions were faster. The hierarchy flattened. The work became, briefly, more human.

And then the system came back online, and we returned to clicking through mandatory fields and waiting for electronic referrals. Nobody in management asked whether something valuable had been revealed. The incident was filed as a failure to be prevented, not a signal to be understood.

This is the deepest challenge of antifragility. It requires us to look at disruption and ask not just "how do we recover?" but "what did we gain?" It requires us to see redundancy not as waste but as investment. It requires us to welcome small failures as the price of avoiding large ones.

Most of all, it requires us to question whether the systems we have built to reduce variability have, in doing so, made us more fragile than we were before.


Related reading: