Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Incident with Actions, API Requests, Codespaces and Pages (githubstatus.com)
152 points by fah93 on May 10, 2023 | hide | past | favorite | 123 comments


While its certainly not ideal to have incidents. To me, "almost every day" would require a longer pattern. The incident before yesterday was 5 days prior. Even if you go back more than a month, there are 12 days with incidents, so ~1/3. Less than half certainly isn't almost every day. Not really seeing a solid pattern in the times either.

  Apr 17: 17:17 - 17:42, 19:28 - 19:53
  Apr 18, 09:28 - 09:51, 14:35 - 15:29
  Apr 19, 18:28 - 19:27
  Apr 24, 12:57 - 13:27
  Apr 24, 21:14 - Apr 25, 03:18
  Apr 25, 23:26 - Apr 26, 00:29
  Apr 27, 08:59 - 09:56
  Apr 28, 12:26 - 12:45
  May 2, 14:58 - 15:49
  May 4, 15:55 - 16:23
  May 9, 08:07 - 10:04, 11:32 - 21:14
  May 9, 22:39 - May 10, 00:06
  May 10, 13:00 - ?
Source: https://www.githubstatus.com/history

Edit: This comment was in response to the original title, which has since been changed.


Ya very odd claim by OP. Looking at the history, while not great, it certainly isn't every day nor has any type of pattern.


Every business days?


7-days in April there were outages.

3-days in May so far with outages.

GP point is, that's anything but "daily".

GP also depicts, it's not at the same time either.


Sure, but that's certainly not five nines of reliability.

How many billions of dollars did this collectively cost? That's lots of wasted engineer time at best, and the inability to build a hot fix for your own outage at worst.


Github doesn't claim "five nines of reliability".

There SLA is for 99.9% ("three nines").

Which equates to:

  Daily: 1m 26s
  Weekly: 10m 4.8s
  Monthly: 43m 28s
  Quarterly: 2h 10m 24s
  Yearly: 8h 41m 38s
https://github.com/customer-terms/github-online-services-sla


So they don't even manage to stay within the three nines.

Today they already spent their whole quarterly budget. With the incidents April and May they have already exceeded their whole yearly budget.


They are aiming to maintain a single 9 reliability.


At least Github's status reporting is accurate and generally truthful. At my current job, engineers are totally allergic to reporting incidents, especially to the public status page, since doing so would put you in the spotlight of upper management and make you a prime candidate for a negative performance review in the next cycle. Happens time and time again. The result? Things break, no one takes responsibility, and customers complain while we lie and tell them everything's fine. Shrug.


At one place our CFO string armed the devs to use AWS status as our own. Logic was we use them, therefore we're justified. Everytime a client reported an outage he'd gleefully share the status page and say "must be on your end". It was almost always on our end. Dude was a lawyer too


Yikes. The AWS status page is a joke -- it's even a joke internally at AWS. There used to be (might still be?) an extension which just replaces all the icons on the AWS status page with one level worse than actually reported.


When was it, like 2013 where they had a bad outage and because the status page was hosted on S3 and S3 was affected they couldn't update the status page, so it showed all green even though half the internet was down.

Good times, Amazon. Good times.


Their old pre-acquisition status page had automated up to the minute stats. Their new status page often takes hours to reflect when something is broken.


> ... had automated up to the minute stats

This sounds like a SRE's absolute nightmare.


Unless you're actually a professional and don't like lying.


Not talking about lying, but these status pages are rarely fully automated, meaning that there will be at least one person hovering around and constantly asking "are we there yet?".


In my experience, that happens regardless if the process of saying "we're down" is automatic or manual.

I'm not sure I follow your logic here. People will ask SREs "are we up yet?" more because the status page automatically show downtime rather than manual, or what are you saying?


Why?


Wild guess: Are you working at Sendgrid?


AWS is also an acceptable guess


99.5% of other tech companies are also acceptable guesses. Some will only inform users of service disruption after the media picks it up.


If a publicly traded company in the U.S. did this, I'm curious if it constitutes securities fraud.

I.e., a material misrepresentation about the company's health.


Matt Levine is right again: it seems everything is securities fraud.


TL;DR of everything Matt Levine has said: Lying and misrepresenting anything at all is securities fraud. Makes it pretty simple to not commit it: DON'T LIE.


Unless not lying / over sharing is bad for your shareholders. Lawyers could argue that publicizing non-material bad news about your service health makes your stock price drop.


A prime example from the Systems Thinking post. Errors of commission are the only thing that matters for your company and it is allergic to change/disruption from top to bottom.


It's almost like every company is its own little authoritarian regime. Hmmmmm, maybe Karl Marx was onto something...


I don't think ownership (owning vs. working class) itself is the problem here.

You can see similar behavior (hiding issues, lying about them) in any scenario in which punishment is tied to the failure to meet the metrics. When the metric becomes the goal, people will game the metric in order to avoid punishment and/or reap the rewards.

The only kind of measurement that doesn't cause this behavior is the neutral measurement - measurement which is guaranteed not to change the decisions directly, but only serve as information which is later used in high-level optimization of systemic processes. In context of software companies, it would mean that status reports should only be reviewed in combination to other factors, such as tool used, features implemented, deadlines tightened, etc., and only to further understand how to optimize the process to avoid human error.

It's risky to implement that, because you have to trust your employees enough to say "you won't be fired because of your metrics, ever", but I believe it's less worse than having people falsely report good metrics.


I think you also need to set up an "adversarial" organization. Have a team responsible for reporting metrics that is distinct from the team that is responsible for software that drive the metrics. Classic Dev/QA partnership.


The US military essentially landed on a similar solution.

   - Regularly rotate personnel
   - Reward review by a random board
The rotations prevent cooperation of "adversarial" parties from corrupting the process.

A review based on a blend of quantitative metrics (basics/table stakes) and qualitative metrics (everything else) prevents quantitative-metrics-gaming from dominating.

The system has a ton of flaws, is inefficient, and produces suboptimal outcomes a fair amount of the time, but damned if it doesn't scale.


> When the metric becomes the goal, people will game the metric in order to avoid punishment and/or reap the rewards.

Goodhart's law!


Exactly. I always forget its name tho.

The book "The Tyranny of Metrics" by Jerry Muller talks about this phenomenon in depth, giving many historical examples of metrics making systems worse instead of better. Very good read, I'd recommend it to anyone.


> Exactly. I always forget its name tho.

Me too! Fortunately, Google was smart enough to turn it up from "law about metrics".

> The book "The Tyranny of Metrics" by Jerry Muller talks about this phenomenon in depth, giving many historical examples of metrics making systems worse instead of better. Very good read, I'd recommend it to anyone.

Thanks! I'll have a look.


In the real world, owners simply don't care about your enlightening idea. Why would they? And what are you gonna do about it when they don't?


I don't quite understand the context of this post. I was talking about metrics such as uptime percentage, not enlightening ideas. Can you explain what do you mean?


That is fair, sorry. I'm just saying the boss is in charge at the end of the day. Regardless of any observations or knowledge contradictory to their practices, they ultimately retain all the power.

Productivity (of something...) is ostensibly the intent of a capitalist venture, but the owner and ultimately prioritize anything they prefer. This is a problem for any hope of rational practice.


> Productivity (of something...) is ostensibly the intent of a capitalist venture, but the owner and ultimately prioritize anything they prefer. This is a problem for any hope of rational practice.

While that is true, the whole argument against metrics as goals is that they decrease productivity, which goes against a common interest of both capitalist and communist societies.

Productivity as a goal simply makes sense in the real world, regardless of whether there's an owner who is chasing profit, or a worker collective who has to meet their planned quota for the planned economy. The state of facts is such that binding metrics to rewards is harmful to their overall reliability, regardless of which economic system it's happening in.


Make my own. Just a reminder for you lefties but people are capable of creating things, many times it really doesn't take that much to do so, nor does getting rich in the process.

Besides, why should anyone trust a leftist in power? We already know your lot will immediately outlaw and jail the opposition and pretend their uni-party system is "democratic". So I don't think you actually care about owners and power, you're just mad that you aren't in power perpetually.


He wasn't.


Socialist and communist regimes are legendary around the world for lying to supervisors syndrome. The politburo always gets the numbers it wants.


Yeah. Almost like the capitalist pyramid shaped corporations while actually being... The same?! What a shock!

I know you probably heard that one, but as long as there is a boss or a general, it's not communism.

Also, democracy is vulnerable to this as well. No politician wants to appear incompetent, leading to lying about results and taking unnecessary risks.


> as long as there is a boss or a general, it's not communism.

I believe you may be confusing communism with anarchy.


They're legendary to people mostly exposed to capitalist-owned media, that is.


Ha, I’ve never read Marx myself but that’s a great viewpoint. And it’s on my list.


Marx actually says the opposite. People form a dictatorship (dictatorship of the proletariat) to engage in a transition period away from Capitalism.

Saying "the people own" is just a way of saying the state owns something. The state owns all the things under Communism. Every business is a state business.


In Marx's usage (the common usage of the time), dictatorship just means "leadership", and doesn't have the same connotation of "ruling over others without their consent". Which is his point, really: socialism exists to take the reins of the dictatorship of the bourgeoisie (whose contradictions are propped up by a force-monopoly in the form of "a state") into a dictatorship of the workers so society can be transitioned over time into one where a state no longer needs to exist ("communism").

So, yes, in the meantime, the state takes control over some amount of enterprise and manages it for its own goals rather than to generate profits for a small number of owners. Is that a bad thing? In a world where for-profit healthcare in the US is famously more expensive and less effective than nationalized healthcare in any other OECD country, where the (unionized!) USPS is one of this country's most-popular and most-reliable services even despite years of meddling from capitalists in government pushing to gut it, in a world where famously most of our R&D is publicly funded, is it really so hard to imagine that it might be an effective way to manage production?


> socialism exists to take the reins of the dictatorship of the bourgeoisie

> doesn't have the same connotation of "ruling over others without their consent".

Usually taking things by force requires violating consent. That is a very basic tenant of Commmunism. The bourgeoisie are forcibly removed from power, with arms, usually killed, and their property taken.

> whose contradictions are propped up by a force-monopoly in the form of "a state"

And how is the dictatorship of the proletariat preserved against the minority who disagrees? Hint, it involves rifles and usually firing squads.

> society can be transitioned over time into one where a state no longer needs to exist

Yes, that's gone so well up to date. Dictators with extreme powers over entire states are well known for quietly surrendering their power and moving to a form of utopic anarchy.

Even Marx couldn't answer how the utopic Communist post-state society actually continued it's existence without a state. "The vanguard" who at best are stateless secret police?

> in the meantime, the state takes control over some amount of enterprise and manages it for its own goals

Correct

> Is that a bad thing?

DMV, VA, Census Bureau, the IRS.

> USPS is one of this country's most-popular and most-reliable services

and is private but government directed

> In a world where for-profit healthcare in the US is famously more expensive

and has no waiting lists compared to 14.5 weeks in the UK at the moment and 20.9 weeks in Canada

> and less effective than nationalized healthcare

The US is in the top 10 for survival rates for most diseases/conditions (eg cancer) and we beat out the UK and Canada significantly. We have a lower life expectancy because we are too fat.


I almost wrote a response, but then I kept reading. Frankly, your comment needs A LOT of citations, and good luck finding them.


Pinching myself as I witness actual Marxist analysis on HN of all places.


The kolkhozes were not owned by the state in the Soviet Union. State-owned enterprises represent about 20% of total employment in China. Ownership is simply contingent and limited according to determinate social ends (as opposed to the total anarchy of the market in classical capitalism), but the same is true of every modern economy in the world.

As you say, the difference is whose interests dictate these ends: those of ordinary citizens or those of the monopolies and international finance.


92% of China's top 500 companies have major CCP board ownership.

https://www.institutmontaigne.org/en/expressions/influence-w...


What percent of boards in the US are staffed by Democrats and Republicans (our one party with two names)?


CCP party membership is 6.7% of the population. It is restricted and a point of honor and power to be a party member.

6.7% of the country runs 92% of the boardrooms.


So no answer to my question. In the US can anyone be a corporate board member? Or is it gated by myriad social factors such as having attended an elite B school and proving yourself as a loyal servant of Wall Street bankers?


> our one party with two names

... If you're not a woman, gay, trans, black, poor, ...


Do you not live in the US or what?


>"The state owns all the things under Communism."

The state owns nothing under Communism as Communism assumes no state, social classes, private property and money. I do not think Communism can be achieved though, maybe in prehistoric society of docile gatherers of which we are not.


Perhaps eventually, after we learn to live better.

But that's no reason to give up and make greed mandatory.


People are greedy to a various degree by nature. Same for violence and desire to control other people. This is in our genes. I have no idea how you could suppress it without The State (comprised of control freaks willing to use violence) being on top


are you a geneticist? got an sources to back up that claim, that we are all inherently selfish? take a critical look at the society you were raised in.


I am not and I do not have to be. One can see it plain all around. I am very critical of the society but it does not change a fucking thing.


The group exercising their monopoly on force to implement the laws of communism is the state, unless you plan on making participation voluntary. Calling it “the people” makes absolutely no difference.


I am not calling it anything. This is how Communism is defined. It never happened and what we have instead is anything but.

Under original doctrine there is no State because "the people" will have evolved enough to do without it.


Capitalism is a stateless society where individuals are free to associate how they wish. Any problems with this definition?


> Capitalism is a stateless society where individuals are free to associate how they wish. Any problems with this definition?

Well, yes, since “capitalism” was literally coined to describe a particular then-existing system in which the instrumentality of the state not only exists but also, most particularly through a specific structure of property rights, directly serves the interests of the capital-owning class to the relative disadvantage of other segments of society, and has consistently since then referred to that system (or ones sharing some of its salient features) both for those who have advocated moving farther from it and for those advocating its adoption or a return to it in places which have moved on, in whole or in part, from it.


I'm glad we agree that defining a state to be stateless does not make it stateless. So back to what I was saying, when "the people" come together to enforce laws onto others, they form a state.


> Capitalism is a stateless society... . Any problems with this definition?

Other than it's gibberish? Capitalism, a system where money is used to purchase property by individuals, is a society ... without issued currency or individual property rights, you say?

Look around you, are capitalist countries stateless, and are stateless countries Capitalistic?


Communism and private property are not at odds. Communism and private capital are.


tech has been unraveling for years now, especially with constant layoffs despite reporting record profits. Marx was most certainly onto something.


The murder and starvation of millions of people?

Even the “communist” government still around like China just have capitalist economies in sheep’s clothing.


the stockholm syndrome in tech / capitalist america is astounding. While it's true that some communist regimes have been responsible for the suffering and death of millions, it's important to recognize that this is a distortion of Marx's ideas, rather than a direct result of them. Additionally, many capitalist societies have also had their share of social injustices and economic inequalities. It's essential to separate the philosophy from its historical implementations and consider the context in which Marx's ideas were formed.

In the case of tech companies reporting record profits amidst massive layoffs, it's worth exploring whether certain aspects of Marxist theory could shed light on the systemic issues within our current economic system, rather than dismissing them outright based on past examples of "communism"


[flagged]


These responses are the worst of bad-faith every time I see them. If you live in a place, with family and friends and social ties, but it has some stuff you want to change, why would the solution ever be "move away" rather than "stay and make it change", except that the former is more convenient for assholes who are fine with how things are?

As far as getting "shot" in Cuba, you're more likely to get shot by a cop in the US than in Cuba for any particular reason. Stop just soaking in wild propaganda uncritically.


That’s right, collectivists never want to reside in collectivist countries, they insist on converting capitalist countries so that they can leech off of wealth and businesses that wouldn’t otherwise exist under their own system. If Cuba is so great, put your money where your mouth is and live there. I have always and will continue to always have absolutely zero issue when communists steal from each other, all I ask is that others be left out of it.


Talking about bad faith... you know you're replying in a thread about GitHub incidents where someone is blaming capitalism for that right?


Yes, if there's one thing Marxists are known for it is definitely avoiding authoritarian regimes. Oh, apart from literally every country that tried Marxism.


One can point out someone's observations without subscribing to their ideology.

Not sure why some commenters here saw the name Marx and immediately jumped to the conclusion it was an endorsement of Marxism.


It would be interesting what Marx would say with the hindsight of observing Soviet capitalism, and how it fundamentally failed to solve the class struggle.


He'd write a bunch of books making excuses for the failings of his approach.

Soviet Capitalism never existed. The Russians barely made any effort at enshrining a just rule of law, protecting individual rights, including protecting property rights.

One of the first things Putin did when he took power is to begin destroying individual rights (so as to bolster the power of the state), which market based economic systems rest upon.


Oh but "true Marxism has never been tried", they will retort.


Well, it hasn't. Because humans aren't capable of true Marxism.

(This is not a suggestion to try anyway, nor a suggestion to eliminate humans in order to further any kind of Marxist "ideal")


Maybe, but so they aren't capable of living in a 100% transactional capitalist dystopia. As evidenced by everyone popping neurol (or another coping mechanism) with ever increasing frequency.


I completely agree, and I'm sorry if anything in my post suggested otherwise.

My preference is for a social democracy, of the variety sometimes found in Europe.


There is a huge difference still.

At 3/4 Capitalist, your nation can prosper dramatically. At 3/4 Communist, you will know nothing but suffering and deprivation.

Countries that have gotten closer to a free market orientation have prospered. Singapore, Hong Kong, the United States; all of the success stories of post USSR Europe are stories of market economics winning (including the dramatic example of the Baltics). All of Scandinavia operates via market based economic systems. Germany is a market based economy, not a Socialist economy, and they're the economic dynamo of Europe; the same goes for Britain, and even France (which certainly has a large welfare state, and is a mixed economy but not Socialist). There isn't a single prosperous Socialist nation in the world today. There are dozens of prosperous market based economic systems by contrast (the top several dozen most prosperous economies are dominated by market systems).

Going the other direction, the closer you get to Socialism, the worse things get. And there are obviously limits to the market direction, anarchy does not work (although I'd argue that's not the market direction; market based economies in particular require the rule of law and a government to enforce it).

The only country that can be argued to stand apart from this, even if temporarily, is China. However they're still drifting on the liberalizations of Deng and the benefits it spurred, that is grinding down under Xi. Their situation has worsened as they have headed back toward Socialism (and its inevitable increased authoritarianism) and away from market liberalization.


People very often argue that "it's hard to run large distributed systems" and also that "It's better to use SaaS as getting five-nines is really hard."

I agree, but I'm going to repeat the same thing I said yesterday:

It really doesn't matter if your uptime is 80% as long as that 20% of downtime is happening when nobody is working.

An 80% uptime architecture is hopelessly simple to maintain (and restore and so on).

Complexity increases exponentially the more 9's you add, and those complexities need more and more people to paper over, eventually you end up with a false dichotomy. If you have 99.9% uptime and it's down when you need it with no clear resolution possible then you're not winning by having availability when you never needed it.

The people arguing against self-hosting are so committed; I can't help but feel people know that it's a false economy but they're financially incentivised or they like having pretty green dots on a special website or something.

Rachel by the Bay is US based (I am EU based) but even she has something to say on the subject: "My Nines are not your Nines": https://rachelbythebay.com/w/2019/07/15/giant/


I'm surprised by this perspective, I think in particular because it is hard to pick and choose availability.

First, like many people, I work in a company that spans multiple time zones. At least 16 hours of the day are in someone on the eng team's working hours. Plenty of places _do_ practically speaking need to use version control 24/7 when you factor in both global teams and out of hours incidents, which at a large enough scale become routine.

Second, when something fails, it doesn't _usually_ come back up of its own accord. It comes back up because someone fixes it. So you're either pulling people in out of hours to fix problems in these tools, or you're fixing them during work hours, which means that you have an outage when people are working! And if you work in an even slightly global team, oops, it might be both – it's in working hours for the people affected, but out of working hours for the people that maintain and can fix this system!

Third, because operating these systems tends not to be someone's full time job, even trivial issues like "the disk is full from log entries because it got mistakenly set to be in debug mode" can wind up taking a surprisingly long time to diagnose and fix.

What am I missing that makes it practical for a system to have 20% downtime, but only when people are not working?


> I think in particular because it is hard to pick and choose availability

I mean, human error is 95% of all outages, intentional changes to the running system and how it operates. You definitely can choose when you do those. Hardware or system failures outside of that are exceedingly rare, though I do submit they happen; simple systems are easier to stand up again if they fall over in the event that does happen.

I mean; Stopping a database for a backup is a prime example of something that's significantly harder to do when you're running it 24/7 365.

Similarly, performing a host migration, or running a potentially dangerous major version upgrade.

What we do normally is limit the cost of rolling back, which is nice.

But, instead of "iterate quickly and if it causes an outage we roll back" it can easily be: one guy stays a little bit late and upgrades the server and if it doesn't work then they roll back. -- impacting a much more limited set of people.

Also: if you work globally, you can run upgrades on your edges.

Also also: 20% unavailability is just an extreme case. I have 99.997% uptime with a single host machine that lives on a shelf in my room. I'm not saying this is how you should run systems, but it's pretty normal for even single nodes to have insanely high availability out of the box. Bonus: if it does have an outage, because it's so simple a restore takes 23 minutes.

I know this because I run restores on another machine pretty often, and if I run a backup I usually run blue/green between these boxes.


In my experience “large distributed systems” don’t tend to be idling for 20% of the time? I think often such systems have some dependencies 24h a day. Even if it’s much lower in a 20% window you risk leaving clients out in the cold in that window, similarly to the post you linked, no?


Put this way instead then:

If your company is in Europe, and your gitlab instance goes down for 4hrs every day between 2.30am and 6.30am -- does anyone care? 0 people working those hours means 0 hours of lost dev time.

If you have 5 minutes of availability lost at 9.30 every morning it's much worse (at least in GameDev, when everyone is pulling the latest generated binaries). 900 people unable to work for 5 minutes is 75 hours of lost dev time; quite a lot more than 0 hours, even if the actual availability of the system was significantly higher.

If your audience is yourself, and people aren't supposed to be working, availability of those services is not necessary.

Hell. the Google SRE workbook goes into exactly this, so I know I'm not crazy.


With enough retries, 5 ones turn into 5 nines, latency be damned!


With the Microsoft acquisition, GitHub went from maybe overly slowly evolving and totally reliable to constantly changing and constantly broken. I swear at least a couple times a week their webhooks fall so far behind these days it breaks our internal workflows.

I’d like to get off Mr. Microsoft’s Wild Ride and go back to the much simpler and more reliable GitHub of yesteryear.


Google and Microsoft, the beginning of the end of so many products and services. Both just implement whatever will currently make them the most profit, they suck at it and don't care because they're still raking in money. It's sad. At some point both were actually decent companies (setting aside "evil" they did) but now adays both are just shells of that capitalizing on their names still.


"Constantly" is doing some work here, it's still up and doing the same git hosting it's done for 15 years over 99% of the time


I don't think it is 99% of the given workday.

By their published stats, surely. In actuality the number of times I go to do a git operation and it fails because GitHub is being flakey is pretty high, and most of the time when I check the status page, it's clean.


Another "one of github's myrad of services went down today where only a fraction of users will be affected" post, another swarm of "you should completely change your companies workflows by adding in the overhead of managing your own git hosting and collaboration tool" replies.

edit:// self-awarewolf: Another reply from me pointing this out. haha


Honestly, inhousing your knowledge repositories is the least you can do to avoid those all-powerful companies from taking everything you know and using as they like.

If your knowledge is minimally valuable, you should self-host your git and collaboration tools. If it's not, then whatever.


The failure mode seems different to yesterday - then I was seeing failed pushes, today push works but Actions fail with "GitHub Actions has encountered an internal error when running your job." - but it is still irritating having your workflow interrupted on what seems to be an increasingly frequent basis.


Somewhat tangential, but has anyone else noticed that simply going back and forth between pages on the Github site is incredibly slow? There are times it will take 10+ seconds to go back to a page I was just on.


The javascript annoys the hell out of me. It tries to 'mimic' a browser but it comes awefully short. For example, if you go to the PR page and click issues, then quickly click the PR tab (say you missed the filters dropdown) it doesn't cancel the request. Works fine with JS disabled. Same if you click across all the tabs in quick succession, if any one of the requests complete before you click the next one, all clicks get disabled for a few hundred ms or so. Super annoying.

I basically have js disabled on GitHub except for certain pages.


It's all been going downhill since they had some big redesign a few years back and jazzed it up with a bunch of unecessary Javascript. It's changed even more since then. It's inching toward being an "app", which is going to put it in the same "I dread clicking a link to it" bucket as other "apps", pretty soon.

Current pet peeve: clicking the little search field doesn't put my cursor in that field, but instead pops a HUGE command-bar-type UI that covers everything. I hate, hate, hate "Surprise! Ta-da!" UI like that. And the more of that crap they add, the slower the site's gonna get, to load and to use.


Ah, so I'm not crazy! Yes, refreshing a page mid load is often faster than waiting for it to finish naturally.


On mobile (iOS Safari) the navigation is often broken and swiping back goes to the wrong page. This usually happens between the readme and any file in the files browser, especially if I go back before the current page has completely loaded. I suspect there is a race condition with JavaScript hijacking the history API and not updating it quickly enough so that if you navigate back before the history has been made consistent, you end up on the wrong page.


Yeah, it behaves like Jenkins Blue Ocean :)


Today's on May 10th at ~1300. The previous incident was on May 9th ~1130, before that was on May 4th ~1600. Doesn't seem like an almost every day, or the same time. However their May has certainly been incidental.


The cleaner was on holiday from the 5th to the 8th. ;)


While I migrated my personal repos over to Codeberg, I do still submit PR's to repos on GitHub. Today I just submitted a trivial PR, and I'm now watching the poor repo owner repeatedly re-running the failed CI job that keeps saying "GitHub Actions has encountered an internal error when running your job".


Less than 24 hours later [0], another GitHub incident with something going down. This time it is Actions going down once again.

Really, you would get more uptime if you self-hosted using GitLab or Gitea, etc. even some open-source projects like wireguard, RedoxOS, ReactOS, GNU Ring, etc and many others are doing this, with no issues.

Centralizing to GitHub [1] really isn't a good idea and it is showing that it is beyond unreliable for years.

GitHub is going great, and it has never been better. /s

[0] https://news.ycombinator.com/item?id=35874041

[1] https://news.ycombinator.com/item?id=22867803


My experience is that it is usually ok to setup these kind of services. Takes few days, interesting and fun work to get new stuff going and tuned.

Problems arise with the operations. Nobody is confident with doing the updates since they happen so rarely. If system is critical, do you dare to do updates during business hours? Is there proper test system to practice updates? Is anybody prepared to handle recovery scenarios? Who is testing backups? How do you transfer the knowledge when people who originally built the system leave?


I run a gitea instance on a moderate SBC backed by a NAS over iSCSI.

The SBC runs updates weekly, rebooting as needed as per zypper ps. It takes about 2 minutes for a reboot cycle, and reboots about 50% of the time. During this, the service is hard down.

The NAS selfupdates roughly monthly, and it takes about an hour. The service is stopped during this time.

This gives me a "perfect hardware" uptime of about 99.8.

I've had a single hardware failure in the last year, but it had the service down for 3 days waiting on shipping for a part, which gives an empirical hardware of 99.17.

I've not had any major maintenance on this setup in the last few years, though I expect to replace the NAS's drives in stages over the next few weeks. However, for a best-case, I'll give it 100.

My internet has not gone out in 2.5 years, so I am going to give it an optimistic 100.

My power, however, has been out for a total of 30 hours, of which, 28 hours outlasted my UPS. Based on syslog from NUT calling shutdown, until the first timestamp on reboot, therefore I have a Grid+UPS power availability of 99.68

If we compose all these, we have a total uptime of 98.65 of total services, which is far less reliable for basic operations (git cli interactions) than any of Git(Lab|Hub), Bitbucket, SourceHut.

This only covers all services down cases, my single-system setup doesn't often partially fail, and isn't engineered to hide maintenance (and doing so would be far out of scope and budget). Large services tend to fail partially, as they are composed of many individual systems with varied duties, and the system is often designed to fallback to basic function in the event of failure.

Not to even mention the question of capacity, which my 4c 8t of amd64 comes nowhere close to the CDN-ish power of the major SaaS's in this space.


I would like to see real stats on this. Most people don't properly quantify availability and assume just because a big system goes down a lot that one they run could be more reliable, but it's usually apples and oranges.


I've worked using self-hosted Gitlabs for more than 6 years, and they're has been almost no down time, mostly quick updates lasting a few minutes from time to time. Companies tend to fear self-hosted services because tech giants are supposed to be way more reliable and do everything for you, but owning your infrastructure and having people in place who actually know what's going on can be a life saver in many situations.


Unfortunately this experience doesn't pan out the same for a company:

- A large number of users will increase the likelihood that one of them will encounter a temporary issue; even if problems are intermittent, a single person using the system simply won't notice them, but lots of people will.

- More users tends to mean using more features, more system resources, etc, which increases the likelihood of an issue from either buggy/complex features, or resource exhaustion.

- Updates for a team typically involve being more careful and performing updates on a regular schedule, in order to minimize downtime.

- If a problem does occur and you're not around, the team is stuck until you are available. If you go on vacation or are hit by a bus, they're stuck longer.

- Having more users tends to require things like disaster recovery contingencies, security, etc. Somebody has to do this extra work, which is a cost in time and labor.

Where self hosted shines is in keeping complexity down and changes minimal. GitHub is a giant system, making it more likely for problems to interrupt service. They constantly ship changes, increasing the likelihood of interrupted service. A self hosted option can use the same simple system and version for much longer, only taking security patches until the version is EOL and needs to be upgraded.


From an SLA uptime perspective, it’s also worth considering that you don’t have a whole team working to fix your self-hosted Gitlab server when it goes down. So one outage overnight of your self-hosted server could be more downtime than more frequent, shorter GitHub outages.


Your example also suggests another factor: downtime overnight might be less consequential than shorter outages that occur during working hours.


This is kind of a straw man don’t you think? Really small startups sure but I’m sure many places that self host have an ops team that can and does respond to outages on their systems.


Have you ever troubleshot an outage with a whole team? I am strongly in the belief that any more than two people working on an outage makes it last longer.


Properly run incident management can have three dozen people involved with no negative impact. You need the incident manager to coordinate, communicate, run interference. You need a clear set of rules for who will have what role and responsibility. Combination of voice and text, multiple chat channels.

But yeah, if you have no plan or organization, too many cooks is detrimental.


I think even if the percentage uptime is less on a self hosted one, at least you get more control over when to upgrade systems (perhaps in the evenings / weekends) with less impact to the devs.


I am considering beginning our migration to something self-hosted. I am doing it for my personal stuff, but this is getting out of hand.

edit: I know this is a bit circle jerky, but damn I am frustrated


The only impact I notice from these outages is repetitive HN threads complaining about them. They have never affected my ability to work. At worst they've delayed a release by 45 minutes. People need to calm down about this. Feel free to manage your own Git instance and Jenkins cluster if you think that's the best use of your time, but I'm going to continue outsourcing that role to the hundreds of engineers at GitLab and GitHub to whom I pay about $20 per month in total.


Is there anyone impacted by this past just an annoying delay in being able to deploy code? I can't imagine being in an emergency hotfix or deadline situation and not being able to deploy for an unknown amount of time.

Maybe it's possible to run github actions locally in an emergency?


Please turn on and off again - thank you.


they reboot the servers every evening.


or are the hamsters taking their union mandated break?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: