I have nowhere near the experience managing such complex systems, but I can empa...

bri3d · 2025-11-19T04:07:59 1763525279

I have plenty of empathy, having been in plenty of similar situations. It's not a matter of "I can't BELIEVE it took that long" (although it is a bit surprising) so much as that I disagree with the key takeaways here in the HN comments section and in the blog itself, which focus strongly on fixing rare edge case issues (the bad ClickHouse query and a bad config file causing a panic via unwrap), rather than reducing MTTR for all issues by improving the debug and monitoring experience.

I'm also suspicious that

> Eliminating the ability for core dumps or other error reports to overwhelm system resources

from the blog had a lot more to do with the issue than perhaps the narrative is letting on.