A collection of hard-earned ops maxims that expose hidden failure modes, cultural blind spots, and practical warning signs every technical leader should internalize to improve reliability and decision-making.
Operations work is riddled with hidden assumptions that only surface when things break. This article compiles a dozen hard-earned maxims that cut through the noise and show exactly why incidents happen, giving leaders a mental checklist for the moments when monitoring, alerts, and processes fail.
The maxims range from concrete technical observations-"Email is the worst monitoring and alerting mechanism except for all the others," "NTP being off may not be a root cause, but it sure didn't help," "Self-signed certificates beget long lived certs, which beget lack of certificate validity monitoring"-to cultural truths such as "If a post-mortem follow-up task is not picked up within a week, it's unlikely to be completed" and "If you break it, you own it - for now; if you fix it, you own it - forever." Each line illustrates a real-world failure mode that senior engineers and managers can anticipate before it escalates.
Beyond the technical details, the piece highlights how knowledge silos and undocumented work create chronic risk. Statements like "The only other person who knows how this works is also on vacation" and "A README.md in git is no substitute for a manual page" make clear that relying on tribal knowledge is a recipe for recurring outages. Leaders can use these insights to drive systematic documentation, cross-training, and ownership policies that reduce single points of failure.
The practical takeaway is to treat the list as a living checklist: embed the sayings in runbooks, discuss them in post-mortems, and use them to surface systemic problems early. By internalizing these lessons, technical leaders can improve incident response speed, reduce hidden debt, and build a culture that proactively addresses the very issues that cause the biggest outages.
Check out the full stdlib collection for more frameworks, templates, and guides to accelerate your technical leadership journey.