Dont ever page me. How we reduced alert fatigue during our busiest booking period

Alert volumes at loveholidays fell dramatically after the team rewrote its on-call priorities. In January 2025 the number of pages dropped 31% (from 547 to 377) and fell another 46% in February, a clear signal that the changes were working. The reduction wasn't a fluke; it was the result of a deliberate cultural shift that put root-cause fixes ahead of feature work.

The company runs a "you build it, you run it" model with 15-20 engineers on call at any time. Engineers were being paged for the same recurring alerts, creating noise that hurt both system understanding and personal wellbeing. The old informal rule of "never getting paged" broke down as the organization grew, and the team realized that without a structured approach the noise would only get louder.

Leadership introduced a new priority: "Ensuring we never get paged again for the same reason". This moved root-cause analysis, incident post-mortems, and run-book automation to the top of the on-call agenda, ahead of feature development and other tasks. By making this explicit and tying it to visible KPIs on the Tech Dashboard, teams began to own their alerts, reduce duplication, and share learnings across the organization.

The result is more reliable systems, better engineer morale, and a culture that treats reliability as a product feature. The lesson for other tech leaders is that fixing the underlying causes of noise requires clear priorities, visible metrics, and the willingness to say no to new work until the existing problems are resolved.

Dont ever page me. How we reduced alert fatigue during our busiest booking period

Problems this helps solve:

Explore more resources