Why LFI is a tough sell – Surfing Complexity | stdlib

The piece argues that the learning-from-incidents (LFI) approach isn't a nice-to-have add-on; it's a response to the reality that no one in a complex organization has a complete mental model of the entire socio-technical system. While root-cause analysis (RCA) zeroes in on a single vulnerability, LFI treats every incident as a chance to surface the many unknowns that silently shape how the system works.

LFI rests on two hidden assumptions. First, that system understanding is fragmented-people know bits that others don't, and those blind spots are the real source of future failures. Second, that expanding that shared knowledge will make engineers better decision-makers, whether they are reacting to the next outage, designing a new service, or writing code today.

The payoff is fuzzy but powerful: when engineers internalize how observability tools were actually used in a past incident, they can apply that insight immediately. When a senior learns that sharding by request type avoids a class of performance problems, that lesson informs future architecture choices. In each case the organization gains a richer, more accurate mental model that guides better choices.

The article also points out that companies already value expertise-hiring senior engineers is a proxy for future decision quality. Yet that same logic rarely gets applied to post-incident work. Treating LFI as an investment in collective expertise reframes it from optional extra work to a strategic activity that can reduce future pain and improve overall team performance.

Why LFI is a tough sell – Surfing Complexity

Problems this helps solve:

Explore more resources