The "nothing to see here" approach to access control has a lot of weird culture-consequences. I wish software would just address me like the peasant that I am, rather than trying to gaslight me into believing that my artificially limited world is the whole one.
I'll bite. So you want to host a git repo. You need to manage SSH keys then, and then you'll want bug tracking and PRs for code reviews. This means that you need to host something like Gitlab, which is not free if you want features like SSO.
The phrase "sensitive info" tells me disclosing it to a mass code hoster bare any liability deems irresponsible. Remember Eric Schmidt about privacy on the internet?
No true Scotsman! Throw your computer into a lake! Write your new darknet market ecommerce platform on sticky notes!
Why is “security” always a pissing constant with some people. I’d swear that I’d be condemned for locking my house at night instead of simply encasing myself in concrete for all eternity like Chernobyl. After all, it’s more secure!
I think if you are in a corporate account and have correct access permissions to the account (i.e. URL namespace) it should not show 404. It's just super confusing.
We were warned multi-deploys with big changes were incoming: "For lack of a better term, some big shit is coming at GitHub Universe." - Thomas Dohmke, CEO
One of the worst outages I witnessed was due to negative DNS caching on the most outer router that took the company a few (3 or 5) working days to fix *after* issue was identified.
I cannot wait for Gitea, Forgejo, and GitLab to start federating with each other via ActivityPub. Then we can all take one more step away from a corporate-controlled internet.
- 'one more step away from a corporate-controlled internet"
Downvote me to grey-world if you like, but I think everyone's crazy to put all their code infrastructure in the hands of fucking Microsoft. Especially literal free open-source software. Who do you think Microsoft is? What do you know of Microsoft's history and their core values (they're "embrace, extinguish & exsanguinate"). It's like giving fucking Sauron safekeeping of your power-rings in Mordor, oh we have great infrastructure for safe ring storage here, very secure, the orcs are really expert guards.
What exactly is the risk? That they'll stop providing the services they sell today? The design of git makes switching to another primary remote very easy (granted, most users probably don't have good habits around backing up data from Issues/Wiki/Releases and risk losing that data if it's taken away suddenly -- but the repo itself is durable and portable on a whim.
You have slightly illustrated it yourself in a roundabout way, but let's be clear with wording here:
> The design of git makes switching to another primary remote very easy
It is never as easy as just switching to another primary remote.
It's not just Issues/Wikis/Releases, but the build/CI process(es) that are rampant on GitHub now, the community you've built potentially coming up on GitHub and not really getting that there's anything else out there, etc.
This is all time consuming hard work. People will often just not do it, and this is why we have lock in everywhere. Stop calling it easy.
The centralization of the internet around a few specific services - like GitHub, Cloudflare, and so on - is an increasingly problematic thing to contend with.
Quick data point: while not exactly trivial, my team has migrated completely from github to our own hosted gitea. Including CI, releases, issues, read-only mirror back to GitHub, PRs etc. The only thing we don't have that would be nice is the ability to take a PR directly from a fork of a repo in someone else's Gitea deployment (or GitHub for that matter). To take an external PR we either need to recreate it manually in our Gitea under a team member's account, or give the external contributor an account on our Gitea.
> (granted, most users probably don't have good habits around backing up data from Issues/Wiki/Releases and risk losing that data if it's taken away suddenly
At one of my jobs, they used Asana when I started. It was too full of backlogged issues, so we moved over to Jira. Then Jira got too full. A month before I was laid off, one of my coworkers said, "Maybe we should try out Asana."
That would surely be nice and helpful, but why do we need to wait for it?
Even my open source projects in github are just mirrors from the "real place" of work: gitlab or my own gitea instance. If github is down, it is a minor inconvenience but I can still work.
GitLab, Gitea, and Forgejo are applications that can be easily self-hosted. One of those also has a corporation associated with it, but that has minimal effect in this case.
I know this a greybeard's fantasy and that most people working today were trained not to bother, but: important things should not have GitHub as a failure point.
Hobby projects and today's work? Sure. Point straight at GitHub and hack away. And when it goes down, get yourself a coffee.
But everything that's anywhere near production should have already pointed those github requests to a mirror or other tool in your own controlled ecosystem. The status of GitHub should have nothing to do with whether you're customers are getting what they're paying you for. Same goes for Docker containers and every other kind of remotely distributed dependency.
It's not that black and white. Where do you draw the line on what can and can't be a failure point?
My cloud provider is probably an acceptable point. If every AWS region goes down I'm not going to have a spare cloud provider.
What about an auth provider? Do I need a backup there?
What about CI, do I need multiple CI systems?
3rd party search services, realtime messaging services, the list goes on.
For 1% of systems, you need backups for all of these (or to not use anything external). The other 99%, building backups for every one of these systems is a losing business strategy.
Some of them sure, but which those are will vary based on the context. It's not as simple as having a backup for "every other kind of remotely distributed dependency."
It depends on the Recovery Time Objective we are talking about. For example at our company there are daily backups of both our dependencies and those used in the build process. If github had a prolonged outage or accidentally deleted all their data we could set up a gitea with all our code within hours, and get a replacement for Github Actions working within days.
But that takes longer than the expected duration of this outage, and is a lot of work. It's not like we have a standby gitea we can just seamlessly switch to, so we are still hit by this outage. On the other hand for build dependencies we do have a standby mirror.
Availability is a measured outcome. How's it gonna help you quantify your various risks _before_ they become a problem so you can spend your mitigation time wisely?
I was too brief, I meant I would add redundancy reactively where there was none there before if a service I’m using can’t consistently keep 4 9’s. You are correct that being proactive is more complicated.
Interesting perspective, disagree on one central tenant. Even though Github holds production code, it is not production. Built artifacts and the machines running those artifacts is production. When Github goes down, which it rarely does, it just means developers cant sync for a couple hours, no different than if someone works offline. The temptation to increase internal devops complexity should not be an automatic immune response when a service goes down, it comes with all sorts of hidden costs.
It's not entirely clear to me whether you're talking about using GitHub for your own production tooling, or as a source for some arbitraty third party component. If it's the latter, then I completely agree with you. Use a read-through proxying package repository. I don't care if you run it yourself or if you pay a provider, but don't pull stuff from the origin every time you build.
In the general case, adoping an external system will bring with it greater reliability than trying to run stuff oneself. The differences are that you don't get to choose your maintenance windows, and you can't do anything to fix it yourself.
Take care about who you pick, and own the depencency, because you've put a part of your own reputation in the hands of your provider.
Now, if you pick GitHub as a part of your controlled ecosystem -- which is totally reasonable, if it fits your use-case -- then you still shouldn't be pulling arbitrary stuff from places outwith your control. GitHub has package repository tooling that you can use :). Although it's not entirely clear to me that it's as suitable for third-party dependencies as tools like Artifactory or Nexus.
Yeah, I was talking about access to third-party dependencies, which I had assumed was the crisis the OP had in mind.
I've since read many sibling replies fret over build automation or source code storage, but some downtime in build automation will rarely damage customers and (my goodness) I hope people aren't trusting the only copy of all their IP to a business partner and that they always have a very recent backup somewhere they own themselves.
Some things are disappointingly hard to back up properly. Source code isn't really in that category.
There's still a question of how strong an ownership we need -- for example, my mail server is owned by OVH, and its backups are sent to rsync.net. I consider that to be sufficent ownership of backups. I'm much more likely to lose my files than they are.
I'm sure that GitHub also have backups, but they're not for the benefit of their customers.
In theory, the code is there, but putting a project back together in a hurry after trouble at Github is non-trivial. Especially if the build process uses proprietary stuff such as "GitHub actions". The issues and discussions are all Github-only, too.
The common CI spec is sh & make, but everybody hates those I guess.
Git hook management's really awkward. Even with tools to synchronize them (... all of them? You may want some that are just for your own use) it's a pain. "I want these hooks to run, in order, but only when a merge commit happens on a machine with such-and-such designation, and I want it to run the task on a different machine, but we need to make sure that runner's a Windows box because..." that just sucks to self-manage, and yeah, there's no standard for expressing that, you're bound to incompatible solutions.
Secret management's a hellscape and everyone's always glad when someone else solves the problem for you. That alone is like 50% of the value of Github Actions.
> The common CI spec is sh & make, but everybody hates those I guess.
These aren't very good at some of the things you actually want to use a CI for, other than the literal "build" step (which probably is using them anyway, or the per-language equivalent).
Coordinating _and visualising_ multiple builds where some parts run in parallel (e.g. multiplatform), or parts that might be different or skipped depending on how the flow was triggered. Managing caches and artifacts between stages and between runs. Managing running in different contexts, including generic shell and across multiple machine types. A consistent way to manage and inject secrets. Reusing common parts across different CI workflows.
I suppose you could have a parent job taking up a full slot to run a Makefile that launches and manages the running of jobs on other nodes, but imagine you'd have to step into some toolset that abstracts some of it, and hope that is shared development or you end up with an in-house convoluted nightmare of shell scripts.
"Something DAG and yaml shaped" is about the closest convergence we have gotten, and the closest that it looks like we'll get.
Are we talking GNU Make, nmake, BSD Make? (Isn't this why autotools exist in the first place -- to make something allegedly cross-platform for different flavors of Make)?
I get bitten repeatedly by sh being a symlink to different Shells, even though I've been using it for many years. The most recent piece of insanity being "cd foo bar" resulting in an error, but changing to foo directory while in some other version simply resulting in an error.
Also, error reporting. It's way too broken to consider sh a reliable tool. I wish things could be done with very simple tools and all this complexity around them was unnecessary. Unfortunately, here, this complexity while isn't unavoidable is indeed warranted due to abysmal quality of the simple tools.
I understand the sentiment, but I think it's phrased incorrectly.
What's needed is a formally defined CI spec(s). Common is bad for the same reason any monopoly is bad. Formally-defined solves some of the same problems common is solving where it's important to be protected from random failures of the sole provider, but it also makes it, at least theoretically, easier to have multiple providers.
This is similar to how C is different from Make. C is a standard that anyone can implement, while make is a weird language defined by its implementation that some tried to reimplement, but by doing so only increased the insanity of compatibility issues.
Of course there were multiple attempts to make common / standard definitions for general-purpose automation tools. Make is one of those, Ant is another one, and there's plenty more. I'm not sure why none really sticks around to the point of becoming a universally accepted tool / standard. Some reasons I can think about are: languages build special-purpose automation tools around their implementation which are often the selling point of the language, an attempt to sell it to developers, so the authors are disincentivized from making them general-purpose. There isn't a consensus on what such tools should do and what kind of guarantees they need to offer. Some such guarantees may come with a price, sometimes very steep price, so would be hard to sell. Eg. something like tup offers highly reliable reproducible and isolated builds, but at a cost of complexity and resources, whereas something like Make offers no guarantees, but is easier to get started with and to be productive.
Maybe it could be possible to extract just the component of CI that deals with the "skeleton" of automation, defining abstract tasks, dependencies between them etc... but then immediately there'd be plenty of systems that'd try to complement this automation with their own (proprietary) extensions which would circle us back to the initial problem...
My approach is to write bash scripts that do the heavy lifting, and use of pipes and output redirection to coordinate the individual "steps". For example, one script would run the test suite, and another would process the code coverage reports.
In CI, it now just needs to run the scripts in the correct order.
The source code of the actions is in your repo. This is more of the problem of relying on proprietary software; if the company that makes it dies, you die too!
Host a backup of your own code? It’s easy and can be done on a rpi. I wrote a go program in 1000 lines that automatically does this for me. And then I actually started using that as the main source and pushing the backup to github.
It also pulls down anything i star into a different folder which get a sync one a day. The rest get a sync every hour.
Sadly, GitHub doesn't store its value-add assets within the repository itself; so all of the PR conversations, Gists, Issues, and so forth aren't within git itself.
It's the way the open source options work too (GitLab, Gitea, Forgejo), and they don't have any sinister motive for that—it's just easier to build that way.
This speaks to one of my secret desires, I've worked at a bunch of small companies now where the workflows were all ad-hoc and I feel like even at bigger companies there are persistent failures to "understand Git"... one of the most pernicious being that, most companies have a modernist "this is prod, it is one place running one codebase" approach, Git is postmodernist "here is this project, there are many branches showing different perspectives on what this code could be", most CI/CD systems in my opinion sit at the interface between these and choose the wrong side to be on, they choose the Git side -- "we ask every branch, 'do you want to deploy yourself to prod?' and if it says yes by virtue of having a CI.yaml file with a branch filter naming itself to be run, then we deploy it to prod."
So what I want is kind of to build a company that's just strongly opinionated about that... "everything is in one Git repository, the `main` branch is authoritative for ACLs and CI/CD config, the bugtracker writes issues directly into that repository and has an ACL role that allows it to do that, the RFC widget writes your code design docs directly in there as well, we do rebase but we merge without fast-forwarding, you have to use semver and it works like this..." and probably nobody will use our offering because "GitHub is more trustworthy" but "if anybody does they'll love us" haha
I realize you're talking about git, but I think it's also pretty important to have a place where users can submit issues and dev's can say "it's fixed in version XYZ", these are not features of git.
You can host your own on your own site. Most major open source projects have their own website anyhow. They really don't need to be on github or gitlab to begin with, just roll it yourself like the rest of your website. The only reason why people do it is because its a meme at this point to have a github (and even gitlab, if only as a foil to github).
Rather than making it a feature of git, it would probably be sufficient to have a tool that packed issues and PRs from the forge into the repo (in preparation to be migrated) and then unpacked them into the forge's API on the other side.
Fossil lets you synchronize artifacts both ways, so you can clone a repo, participate in discussions when on a plane, write some replies, connect back to the internet and sync them back up to the server.
When your server goes down, you don't lose anything. There's no need to remember to back up your issues every day, every time you clone the repo, you also clone its tickets, forum, wiki etc.
Fossil doesn't have a good story around cross-repo authentication (without separate per-repo accounts), CI/CD, 2FA, SSO support and such. It's a great tool if you're writing a single personal project with little regard for security, but that's about it.
You can use ChiselApp.com (which I maintain) to host your Fossil as well, and since every clone has the same data (minus passwords) it's trivial to migrate.
Why do you need a centralized hub? Most repos on github are entirely unrelated to eachother. Repos can be self hosted. Issues can be tracked on your own website. You don't need a megacorp to run a website. If you are writing open source tooling you are more than qualified to roll your own here.
Cloudflare, AWS, Azure, Google Cloud and Whatever-is-used-by-China: people make fun of the IBM guy who said "the world has a market for maybe 5 computers", but he was right all along...
I need a macro template for those memes with Bart Simpson on the blackboard and make it say "I will always have a backup plan for third-party services".
Seriously people: gitea exists. Gitlab self-hosted exists. Drone/Woodpecker CI exists. It's not that difficult to set up a project that does not depend on Github. I spent less time setting these up than the amount of down time that Github has had this year.
It's amazing how many of these issues can be obviated by taking a step back at a given SAAS, asking can I self host this, then if you can and need redundancy to just buy two desktops and stick one in your place and another in your friends apartment in another town. With a lightweight static site, a modern desktop is probably more than powerful enough to deal with most any load you might realistically see for your given project. You are also extremely unlikely to have both of these desktops go down at once if they are on different local power grids and internet service providers, short of invasion of the continental U.S. perhaps.
So how do you keep data synced between them in a manner that ensures the data is safe? Now you're playing security sysadmin, and you're playing backup administrator, and your playing hardware admin (did you use 2 SSDs bought at the same time?).
You don't ever escape the payment cost of the issues of keeping these systems running. You're paying staff to do it, you're paying a cloud provider to do this, or you're paying out of your own time.
If these things were so insurmountable, the early internet would not have existed at all. Yet it did and still does in many corners, which stands to reason that you can do these things too.
It’s not insurmountable, just there’s no point in adding that risk to most projects. Would you rather tell your boss so and so failed because of a news-worthy GitHub incident or so and so failed because all 3 backups of your own DIY service were taken down by some extremely unlikely (but not impossible) chain of events?
This is where I quote a Narina meme and say "Don't quote the old magic to me..."
You're engaging in survivorship bias. One of my first 'social' media/forum accounts accounts was lost when the admin of the site dropped the user table from a database, and had to go back to everyone online and ask for people to mail in about their accounts. I never bothered to set it back up.
Piles of other smaller sites disappeared for similar reasons. If you completely ignore all that, then yea the old internet was fine. Oh, and that these days hackers are highly motivated to encrypt all your crap for bitcoin.
Typically big sites don't disappear because they depend on separation of duties between different teams doing their responsibilities. For example, knowing how backups work and making sure they are working every day.
If you’re using 50 things requiring separate backup plans, and you aren’t large enough that it’s also no problem to organize backup plans for those 50 things, you’re doing something wrong, I’d say.
yesterday I couldn't set up a bunch of servers that I needed provisioned and cloudflare's API had an outage.
Today if I were using github, my day would be wasted again.
For all the talk about companies trying to cut out on meetings by putting a sticker price to it (this 30 minute meeting could've been an email and cost $2000), at what point do we start saying "this outage could've been avoided and cost us $5k"?
Hosting and running my gitea server costs me less money than a github subscription, and I spend less time at all compared to the time GH had outages this year.
And I can take part of the "savings" and contribute to gitea's further development.
⮕ git push
ERROR: Repository not found.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
PR really spun gold when they decided to label everything from every single one of their databases getting deleted and backups nuked to intermittent connectivity issues as "degraded". Who exactly are they making feel better by not calling a spade a spade?
I take issue with clear PR-speak trying to make the issue lesser than it actually is. When you're having an outage- call it an outage. Having a feature completely unusable and labeling it as "degraded performance" is clearly twisting your words to lessen the outwardly perception of the scale of the problem.
Because they want to use the same language every time, and say something quickly, and litigating whether something is an outage or degradation or latency or intermittent or whatever is a distraction.
People are going to jump on them no matter what. Having a engineers talking to PR wouldn’t help anyone.
I would say it’s degraded. I still see I’m logged in, can still see the org screen and even the repo (on and off.). Don’t think degraded implies anything good. Just that something’s are working others aren’t
Regardless of this: does anyone experience general slowness of Github? I view a file (on web) and it takes time for page to be fully interactive - no buttons work, rest of the file cannot be viewed - just the top part is shown (above the fold, maybe). Honestly, it's so nerve wracking.
Recently github is pushing very agressively for two factor authenticartion.
So I installed the authenticator app.
But the authenticator does not work when the clock on my phone is not perfectly synchronized. But my phone's clock is intentionally sped up by +15 minutes?
TOTP usually only allows for about 30 seconds of time drift between the device and the server. If you really must set your phone to the wrong time you can use a cheap second device, or a hardware token.
Technically if an authenticator app has an option for a time offset it should work, but I've never encountered one.