There is absolutely no way in hell those aren’t autorenewed. I think the answer ...

sidibe · on Nov 18, 2022

I'd be very surprised if disk/machine rot is what it would take for such a large organization to start experiencing major problems. Strange user patterns from all this and recent features/refactors that are suddenly not monitored would hurt something in the stack within a couple weeks if it was really abandoned. Having a few dozen senior engineers/SREs familiar with big chunks of the infra could make it months though.

notacoward · on Nov 18, 2022

Exactly. Things that are abandoned will eventually fail.

Not everything is automated. There's a finite amount of time to create new automation, and that time goes first toward the things that happen hourly or daily. The things that happen monthly still get handled by humans. Not everything is in the runbook either, and what is often ends up scattered among many wiki pages and help notes in tools, so it's not easy for a newcomer to find. So those monthly things have to be done by people who remember.

They remember how to clean stuff up when a quota/capacity limit is being approached, and who to call when they need more, and they can expect a response. They remember how to recognize when their service is approaching overload, or about to enter an oscillating state, and they know how to nudge it back toward sanity before the errors start to pile up. None of that happens when whole groups are gone.

Sooner or later, a service will start to run off the rails in one way or another, and the person who inherited that service will either not recognize it or not find the solution in time. Then it's Fail Whale time.

notatoad · on Nov 18, 2022

maintaining any reasonably popular web property is a constant battle of striking down spammers, scammers, and hackers. their basic server code might keep running for months, but unless their engineering teams have been doing some truly magical work up until now the quality of the content is going to rapidly degrade if there isn't active intervention to keep the bad actors suppressed.

coffeeblack · on Nov 18, 2022

And yet, somehow new Twitter has FAR fewer bots and spam tweets that three weeks ago. Hmm…

Edit: downvoting doesn’t change reality

coffeeblack · on Nov 18, 2022

What is up with the rage downvotes? It’s pretty easy to look at the replies to popular Twitter users. There are waaay fewer crypto scam bots everywhere. Why deny reality?

mrguyorama · on Nov 18, 2022

Those crypto bots being less pronounced probably has more to do with the entire crypto ecosystem being a bit... dying right now.

brohee · on Nov 18, 2022

I had to eventually disable inbound PMs because the "offers to gain money working online" were just too many. So your mileage may vary.

toast0 · on Nov 18, 2022

I ran certificates for a well known service with lots of users (and not many employees), and it was all manual until well after we were aquired. If you don't have very many different certificates, it's not that hard to do it manually, and it doesn't look like they have that many [1]. Digicert is pretty easy to work with on a manual basis.

Of course, even if it was automated, you still have to pay the bills, and if payroll quit as has been suggested in the thread, I'd be suprised if accounts payable is sticking around

[1] https://crt.sh/?q=twitter.com

SoftTalker · on Nov 18, 2022

Everything I run at work is 5-10 years old or older. It all just hums along. Occasionally I replace a hard drive or a power supply.

kccoder · on Nov 18, 2022

Now multiply that frequency against the ratio of machines Twitter has vs you.

sockgrant · on Nov 18, 2022

You get massive DoS attacks pretty regularly? State sponsored hackers?

XorNot · on Nov 18, 2022

On a modest Hadoop cluster of 300 machines with 20 disks a piece, we replaced 3 hard disks a week.

dilyevsky · on Nov 18, 2022

Spindles? That sounds like a ridiculous amount burn rate - it should be no more than 5-10% a YEAR. Backblaze reports only 1.6% quarterly failure rate https://www.backblaze.com/b2/hard-drive-test-data.html

dilyevsky · on Nov 18, 2022

Er actually missed its total 6k disks seems about right then

matwood · on Nov 18, 2022

Is your stuff on the internet? I agree that systems that are 100% internal can get by for awhile without much maintenance, but anything on the internet is a completely different beast. Add in that Twitter is a huge target for DDoS, zero-days, etc..., and I don't see how anything just 'hums along' without near constant monitoring.

tstrimple · on Nov 18, 2022

> There is absolutely no way in hell those aren’t autorenewed.

Pretty sure a number of AWS and Azure engineers believed the same thing.

matt_s · on Nov 18, 2022

Sure auto renewal but is it built to automatically roll certs across every system without any human intervention?

dilyevsky · on Nov 18, 2022

I would expect yes given they have tens(hundreds?) of thousands of machines alone

avisser · on Nov 18, 2022

To be pedantic, that just mean a script exists that can roll the new cert out everywhere it needs to be. There could still be a human who needs to initiate that script being run.

bruce343434 · on Nov 18, 2022

Putting it in the crontab is an absolute no-brainer if you already have the script.

colechristensen · on Nov 18, 2022

> There is absolutely no way in hell those aren’t autorenewed.

I’d take that bet.

BWStearns · on Nov 18, 2022

Commenting to save this for Dec 13th!

dilyevsky · on Nov 18, 2022

If you click on post timestamp you can favorite the comment ;) I guess I hold twitter infra in higher regard after all these years they’ve been in business but who knows…

BWStearns · on Nov 18, 2022

I hold them in high regard but honestly there's just so many moving parts the odds of anything that complex being able to ghost ship indefinitely aren't fantastic. Apparently the NOC team is gone too. If it makes it 1 week without major outages/major malfunctioning the departed devs/sres deserve all the props.