this was in the middle of a scheduled maintenance, with all requests failing at a singular point - that being a .unwrap().
there should be internal visibility into the fact a large number of requests are failing all at the same LOC - and attention should be focused there instantly imo.
or at the very least, it shouldn't take 4 hours for anyone to even consider it wasn't an attack.
in situations such as this, where your entire infra is fucked, you should have multiple crisis teams working in parallel, under different assumptions.
if even one additional team was created that worked under the assumption it was an infra issue rather than an attack, this situation could have been resolved many hours earlier.
for a product as vital to the internet as cloudflare, it is unacceptable to not have this kind of crisis management.
this was in the middle of a scheduled maintenance, with all requests failing at a singular point - that being a .unwrap().
there should be internal visibility into the fact a large number of requests are failing all at the same LOC - and attention should be focused there instantly imo.
or at the very least, it shouldn't take 4 hours for anyone to even consider it wasn't an attack.
in situations such as this, where your entire infra is fucked, you should have multiple crisis teams working in parallel, under different assumptions.
if even one additional team was created that worked under the assumption it was an infra issue rather than an attack, this situation could have been resolved many hours earlier.
for a product as vital to the internet as cloudflare, it is unacceptable to not have this kind of crisis management.