Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tell HN: GitHub will delete your private repo if you lose access to the original
530 points by baobabKoodaa on Jan 31, 2023 | hide | past | favorite | 279 comments
I was surprised to see this in my email today:

> Your private repository baobabKoodaa/laaketutka-scripts (forked from futurice/how-to-get-healthy) has been deleted because you are no longer a collaborator on futurice/how-to-get-healthy.

That was an MIT-licensed open source project I worked on years ago. We published the source code for everyone to use, so I certainly did not expect to lose access to it just because someone at my previous company has been doing spring cleaning at GitHub! I had a 100% legal fork of the project, and now it's gone... why?

Turns out I don't even have a local copy of it anymore, so this actually caused me data loss. I'm fine with losing access to this particular codebase, I'm not using HN as customer support to regain access. I just wanted everyone to be aware that GitHub does this.



I think this is under the assumption of "employee has a private fork of the company repo, then leaves, employee should not keep the fork"

So when "removed as a collaborator" which apparently includes the original being deleted, you lose access to the main repo and all forks, even yours. As if leaving a company.


This doesn’t make any sense to me.

My private forks are *mine* and I most certainly do not want GitHub guessing at whether and when to permanently delete them without my consent.

Companies of course have the right to manage access to their proprietary source code, for example by only giving access to corporate accounts under their control and reclaiming those accounts when an employee leaves.


Any (quite relevant actually) discussion of SCM policy and management aside, you've got to remember: There is no 'cloud', it's just somebody else's computer.


> My private forks are mine

Not if you create them using the "Fork" button in the UI.

Since this behavior has yet again surprised many people, here is the documentation: https://docs.github.com/en/pull-requests/collaborating-with-...


Which also shows the right way to delete a private repo if you want people to be able to keep their forks:

If a private repository is made public, each of its private forks is turned into a standalone private repository and becomes the upstream of its own new repository network. Private forks are never automatically made public because they could contain sensitive commits that shouldn't be exposed publicly.

If a private repository is made public and then deleted, its private forks will continue to exist as standalone private repositories in separate networks.

But I agree this is confusing.


The right way is to make it public first? That's insanity. Making a repo public just to delete it would be a huge information leak even if it was short in duration.


Just force push something very empty to it before making it public. One more step, yay..


So they did think about this use case (deleting a private repo without deleting forks) but did not bother implementing a proper choice for the repo delete flow?


This seems really bizarre to me. They seem to want people to have the network of connected GH repositories, but this behavior promotes "forking" a project in a way that breaks that network, which is to `git clone` and then create a new repo from that clone.

To put it another way, if the user had "forked" the GH repo onto GitLab, there would be no data loss, but that behavior would promote using GH in a way that breaks the upstream/downstream relationship that you see on GH.

It's even worse that the deleted fork was private. What impact does GH expect deleting the hosted private repository has on folks who really want to keep a private copy of the repo, such as offline or on another git hosting site? I'm really struggling to see any real-world positive sides to this mechanism. Seems like an ineffectual legal or compliance CYA.


> Companies of course have the right to manage access to their proprietary source code, for example by only giving access to corporate accounts under their control and reclaiming those accounts when an employee leaves.

This is how it should be done, but is too much overhead for many "IT as a cost centre" companies.


Also, it would ruin my GitHub contributions graph


A simple script and a cron job will fix that problem.


no need for a cron job

custom fake git history with this: https://github.com/artiebits/fake-git-history


It why would GitHub care to build functionality to behave how companies who aren’t paying them want it to behave?


So the main solution would be not forking, but cloning and straight create a separate project? Will it work?


Yes, I believe that if you push a repo to GitHub from a local copy instead of using their web-based "fork" feature it will not deduplicate the repositories. IDK if this affects your ability to submit pull requests though.


Simply saving a copy of the fork locally would be sufficient to keep a copy of it.


My client recently moved from GitLab (self hosted - many teams had their own isolated server) to GitHub.com and managing access for the thousands of developers has been a small headache. We were encouraged to use our personal GitHub accounts instead of making new ones.

They are promoting "internal open source", yet due to a wild variety of permissions, colleagues can't fork to their own space or push a branch for a PR. Chasing the repo owners or at least someone with authority to grant permission is rarely worth the hassle.


> My private forks are mine

Your employment agreement disagrees. Blame the confusion on the blurry line GitHub draws between forking work repos into personal accounts.


> Your employment agreement disagrees.

Er, kinda presumptuous of you (and GitHub), no? None of my private repos, forks of other people's private repos, or other people's forks of my private repos are in any way governed by an employment agreement, and if they were, there's no way for GH to know what that agreement says.


Did you forget the part where this code is MIT-licensed? Yes, they don't own it, but the code is still 'theirs' to keep forever as they see fit.


The original code may be licensed MIT. The MIT license allows for the project to be relicensed, closed source and it is also possible for a proprietary contributions that aren't MIT license to be added to it that are protected as any other closed source code. The MIT license is not "viral" and doesn't require that everything following from it is.

The person may be able to find the original code that was MIT licensed but that doesn't mean that the work done in house is also MIT licensed and that they have any right to it.


IANAL but…

> The original code may be licensed MIT. The MIT license allows for the project to be relicensed, closed source

… this is less compelling to me than this:

> and it is also possible for a proprietary contributions that aren't MIT license to be added to it that are protected as any other closed source code. The MIT license is not "viral" and doesn't require that everything following from it is.

AFAIK, changing the licensing terms of an MIT project isn’t retroactive to prior licenses. A quick search seems to confirm that.

The possibility of more restrictive or revocable licensing of subcomponents is more compelling as a rebuttal to “mine” at a philosophical level, but it’s not compelling from the perspective of GitHub revoking access. They’re welcome to comply with relevant legal actions, but they’re not actually the police of your licensee status and don’t even attempt to be.

Ultimately it’s the person who maintains the private repo who is responsible for and to any license challenges. GitHub isn’t a party or privy to those agreements, and again doesn’t have any pretense of such except compelled by legal action. And I give them the benefit of the doubt that this isn’t their motive.

This behavior is part of their own permissions model, and their own model of the relationship between “forks” and “private”, as defined by their own use cases. It’s a surprising one, but it needn’t have anything at all to do with their view of any given user’s repo’s license compliance.


The first part sets up the second part that unlike the GPL, the MIT license doesn't require that future contributions to the project be any particular license.

Presumably the OP can find the open source project MIT licensed without the company's contributions to it.

The only thing that the MIT license requires is:

> The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

It doesn't even require attribution in the compiled application or any other notices.

And so, it is possible (and I would dare say likely) that the contributions that the OP made while working on the repo at the company unless specific permission was given otherwise would be considered as work for hire or as part of the work product as condition for employment and completely owned by the company (and not MIT licensed).

If the OP thinks that they should have access to the code because of the MIT license, that is something to take up with a lawyer. My IANAL senses suggest that that would be rather fruitless.

I don't find GitHub's model particularly surprising but rather the most reasonable one that opens GitHub up to the least liability for accidental disclosures of content. Error on the side of least privilege and if there's something to work out beyond that, that's something for the contributors to work out themselves - GitHub isn't the arbitrator for that.


> And so, it is possible (and I would dare say likely) that the contributions that the OP made while working on the repo at the company unless specific permission was given otherwise would be considered as work for hire or as part of the work product as condition for employment and completely owned by the company (and not MIT licensed).

This is a radical interpretation of the text if I’ve ever seen one. To the extent any of their contributions were merged upstream, they’re inherently MIT licensed by virtue of being in the same codebase which offers that license. To the extent they have unmerged changes, they may well be works for hire but it isn’t GitHub’s role to decide that between a second and third party.

Again nor do they want to. GitHub is extremely hands off about forks and the licensing implications thereof.

This isn’t a GH posture towards licensing disputes, it’s their posture towards their own authorization model. And that’s fine, but we shouldn’t conflate the two when they’re quite distinct.


Who's upstream?

Open source project and publicly accessible upstream? Yep. They're likely MIT licensed in accordance with the CLA.

Internal company copy of an MIT project? Unlikely unless legal says they are.

If the organization that I worked for made an internal copy of Influxdb ( https://github.com/influxdata/influxdb ), my contributions to the private and internally hosted copy are not inherently MIT licensed too and furthermore don't need to be redistributed.

If those changes were submitted upstream to the influxdata/influxdb repo, and I signed the CLA with them - then yes they would be.

What's more, if I left the organization, then I don't necessarily have a right to the contributions that I made to the private and internally hosted copy's codebase.


It doesn’t sound like we have any substantive disagreement other than potentially what GitHub’s role in it is or should be.


OP wrote

> That was an MIT-licensed open source project I worked on years ago.

which to me implies that OP also received the code under the MIT license and not some other license.


However, the work done while at the employer may not have been done under the MIT license unless permission to license it and distribute it under the MIT license was given by legal.


Yeah I understand how this is unfortunate but for 99% of cases this is what you want to happen when you remove someone's access to a private codebase.

It seems like this is an unusual use case that OP had an MIT-licensed open source project whose source only exists in a private repo.


Is this scenario possible?

- you fork a public repo

- it's visibility is changed to private but you have access through the org

- they delete it

- you lose your fork, which was only ever of the public repo


no, in this case it works as expected;

"GitHub will detach public forks of the public repository and put them into a new network. Public forks are not made private."

https://docs.github.com/en/repositories/creating-and-managin...

But better be safe than sorry...


What's the point? You can always clone locally and push to a new repo without forking on GitHub. Repos will be unlinked and history will be intact.


It's for the same reason that you don't leave ex-employees with access to other intellectual property. Just because they could make their own copies during their tenure doesn't mean you shouldn't try to reduce your exposure.


OP states that it was "an MIT-licensed open source project". It appears that Microsoft has lack of licensing understanding not only when it comes to copilot.

If they are punishing users just under an assumption that user is doing what they are doing (using someone's codebase with non-permissive licence, possibly unlawfully), it's just silly.


agree


It really sounds like the forking and access should be "license aware" which sounds like an absolute nightmare to manage but that would help in this specific scenario?

I can see it simply creating more worms than helpful.


It really sounds like users of Github shouldn't trust Github to keep their forked repositories for them.


Microsoft does this a lot, where they assume that rules that apply to their own organisations apply to all organisations in precisely the same way.

If a Microsoft IC leaves, they should lose access to all Git repos, including forks.

If Joe Random open-source contributor is removed from an open source repo's access list, their fork shouldn't be wiped.

But Microsoft has One Rule To Rule Them All, so they won't make exceptions for unimportant people like their customers.

I see this a lot. A good example is Azure Active Directory, which is basically "Microsoft 365 Authentication" that they rebranded and sold to developers for their own use, i.e.: Azure AD Enterprise Apps, App Registrations, and B2C.

There are many aspects of the AAD design that make zero sense until you pause for a second and realise that it is not designed for you. It's designed for Microsoft 365!

For example, auditing. My customers are typically government agencies or banks, and they have strict auditing requirements, especially related to data access. All user authentication MUST be logged, including client IP address, and everything else. Most access is by their own staff, or by other orgs that have signed various contracts or agreements, so there is no expectation of privacy.

This is basically impossible with many configurations of AAD. It just refuses to collect meaningful audit logs. Why? Because GDPR applies to Microsoft 365 and they don't care about the data hosted on services such as SharePoint Online. That's not Microsoft's data, that's their customers' data, so its up to the customers to enable logging "on their end", in their individual AAD tenants.

There is no way to centrally collect logs as a service provider using AAD in a multi-tenant scenario.

When I asked Microsoft about this, they waffled on about GDPR and privacy regulations -- which apply to them, but not us.

Another example is Microsoft Teams, which hides the name of the organisation people are coming from. In large multi-org meetings this is infuriating, because you have no idea where anyone is from. Microsoft does this because they use outsourcers like MindTree for support, and they don't want their customers to see this in Teams meetings for Azure support tickets. No-one is allowed to see where people are from so that Microsoft can bullshit their customers.


> There are many aspects of the AAD design that make zero sense until you pause for a second and realise that it is not designed for you. It's designed for Microsoft 365!

Business Basic accounts being limited to 7 days of login logs is a huge middle finger to the entire small business sector. Of course they think everyone should just buy Enterprise subscriptions. It's nothing more than a corporate version of "don't be poor".


Segmenting an enterprise version of a product is generally about finding features that are disproportionately valuable to enterprise (centralized control, policy enforcement, auditing, etc) separating them into a different offering. This lets you charge less to small businesses without having your small business product cannibalize your enterprise business.

This seems basically fine to me? If there are a lot of small businesses who are unsatisfied with Business Basic and can't afford Enterprise then there's an opening for a competitor.


The particular segmentation is a questionable choice.

Small enterprises are likely to have small IT/Security staff, and the most likely, therefore, to not notice something awry for a few days, at which point, vital log info has already rolled off the 7-day window.


This is exactly the issue. No one monitors the logs and, by the time they figure out something is wrong, there isn't enough info available to properly assess the scope of the damage.

Another problem is the Business Basic product is too complex for what small businesses need (reliable email) and buying something even more complex to get a couple of extra features like proper logging is counterproductive.

As is, if a small business ends up with a compromised admin account I don't think it's unreasonable to consider migrating them to a different service. It's nearly impossible to guarantee a bad actor hasn't hidden a back door somewhere in all that complexity if your only tools for assessment are the ones offered in the Business Basic subscriptions.


Maybe the small businesses you've encountered are different from the ones I have? My expectation is that most have no security staff and wouldn't use this feature even if it had indefinite retention.


This behavior was not introduced to GitHub by Microsoft.


Github knows the license of projects. It knows if you forked an open source project.


It knows the license applied to the repository. For a private repo it may not be "released" under that license but planned for release.

If AcmeCorp is planning to release - but hasn't - a project under MIT or whatever, they may have the license declared in the repo but that's not a guarantee it's ever going to be released.

If it's a private repo, and your access has been under your status as an employee, then I don't know that counts as distributed to you under that license. If AcmeCorp later decides to change licenses or not release the software as open source, then it makes sense for GitHub not to let someone continue access.

There are a LOT of holes in the system, but I'm not sure GitHub is in the wrong for deleting access to a private repo if you lose access to an organization or whatever.


Software can be open source and not released. Employees can legally release it themselves since it is open source.


No, the software becomes "Open Source" or "Free Software" the moment someone licenses it to somebody else under such a license. Simply copying a file named LICENSE into some private directory has no legal relevance. As an employee, you usually don't get a license to the work artifacts you are working on.


Really no. The employee doesn’t have that right, it’s not theirs to release unless the employer gives permission.


This unfortunately makes sense because it is a private repo. Even if a repo is labeled as being MIT and has an MIT license in it, it still may contain other code of a different license.

Github could do better by warning the repo owner when they delete a private repo. Github could ask the repo owner if they want to convert it to public first (a "set it free" option) or otherwise give the option to avoid deleting the forks of others.


Yes, and under most FLOSS licenses (including BSD/MIT/LGPL/GPL/AGPL), companies are perfectly free to maintain modified versions of the software internally with no obligation to publish the source externally, until/unless they distribute the modified software externally (or allow external use the software remotely in the case of AGPL). All of the modifications are copyright the company and it is their choice whether to release them publicly. Employees having access to the modifications doesn't mean that the modifications have been licensed to them under the original license, and thus doesn't give them permission to distribute the modified software to others under that license.

Essentially what I just said is the same as what you did. The private modifications cannot be assumed to be under the same license as the original software. Gitlab has no way of knowing all these details, and have promised to keep private repos private, so their current policy is the correct one.


GitHub is not supposed to make such decisions for the user here. It is user's responsibility to make sure they delete their private forks if they shouldn't have access to the repo/fork anymore.

What's next? Should we all install spyware on our computers and let GitHub automatically delete local copies of forks as well?

GitHub and the company/person, who deleted the original private repo, should inform the owner of the fork that the main repo was deleted. If need be, company/person can request fork owner to delete their private fork and local clone as well.


I think this incident reinforces that private repos on github.com are a weird hybrid of the public github and on-premise github which creates various practical problems and misunderstandings when those two security models collide.

First off is the fact that forking a repo is often a necessary step in contributing to project if you don't have push permission, so these forks will be created during the normal development processes, not necessarily because the employee was intentionally trying to save off their own copy. So it is perfectly normal for the employer to consider those forks to be something it should own and manage, just like it would on an on-premise installation.

On the otherhand, github still encourages people to use a single account for both personal use and work[1]. Naturally the employees reasonably consider all the forks that are in their personal account to be something that they should own and manage. So you end up with situtations like this.

The lesson - mixing work and personal accounts/computers/devices is a horrible idea regardless of what Github says. Employers shouldn't allow it, and employees should avoid it even if allowed. Then both will have a clear idea of who owns and controls what.

[1] https://docs.github.com/en/get-started/learning-about-github...


Enterprise Managed Users solves the issue.


MIT licensed code doesn't mandate distribution. Companies and organizations are perfectly within their rights to own a private fork of an MIT-licensed codebase in perpetuity.

With that in mind, if you fork an organizationally-managed repository, there's a good chance the owner doesn't want you to continue to have access to that codebase if you're no longer a part of the organization. And the local copy? Well there's a good chance you were only allowed to clone the repo on an IT-managed device with specific 2FA policies and some kind of agent/config to prevent/reduce data exfiltration from that device.

Is it a perfect system? Hell no. Data leaks, that's part of life. And I'm with you that it certainly could be more user-configurable.

But it's also extremely well-documented behavior[1], and seems like a key design choice that GitHub made a long time ago to protect the owners of private repos. Ultimately, if you don't care about who has access to your code, you signal that by making the repo public. Or by telling your private collaborators to make sure they hold on to a local copy.

[1]: https://docs.github.com/en/pull-requests/collaborating-with-...


Seems like you don't like GitHub. Have you considered not using it?


"This unfortunately makes sense because it is a private repo."

I disagree. I expect a (i.e. my) fork to be independent of the original repository, no matter if it is private or not.

It's enough if a fork of a private repository is private then too.


This is why I don't use github's fork feature. There's more than just this restriction they impose upon you.

Instead I prefer to use a "git" fork. I just clone it and upload it to my own repo. Assuming the license permits of course.


This is the right answer: Break the fork link. I sometimes do this to make a private "fork" of a public repo so that I can add my own notes about how to use it, remind myself what happened when I tried it, add a config script for my own peculiar setup, etc.

It's unfortunate because not having a "real" fork makes it harder to send pull requests and track the upstream. But it's sometimes necessary to get around stupid github policies.


Don't you need to have an "GitHub-approved" fork (i.e. use the GitHub fork button) if you want to create pull requests on the upstream project in GitHub? Or is there a way to do that from the kind of repo you're describing?


If that came up you could create a GitHub-native fork and add that as a remote.


Don't use the "Fork" button in the GitHub UI, then. It is intended for collaboration and establishes and maintains the parent-child relationship of "your fork" and if the parent repo is deleted, so are all forks. If the parent repo is private and goes public, so do all forks. If the parent repo is public and switches to private, so do all forks. This behavior is laid out in docs.github.com and is not secret.

This has been the case on github.com for over a decade, and I am slightly shocked that people don't know this. I guess the root of that is that I am surprised that this has not bitten more people than it has.

People assuming things are a certain way and never checking to verify that are by far the greatest source of "I shot myself in the foot" statements that will ever be known.


"If the parent repo is private and goes public, so do all forks."

nope, that's not true:

"GitHub will detach private forks and turn them into a standalone private repository. For more information, see "What happens to forks when a repository is deleted or changes visibility?""

" If the parent repo is public and switches to private, so do all forks."

This isn't true either:

"GitHub will detach public forks of the public repository and put them into a new network. Public forks are not made private."

In these cases, exactly what I would have expected happens.


yep I was wrong about that point. generally my point still stands; if you want total control over your repo, don't use the GitHub "Fork" button to create your repo for that code.

I linked directly to the documentation about "Fork" in another comment.


That would make sense if they didn't use the word "fork" for it, that words has a specific meaning when talking about repositiories[0] and it doesn't include a automatically propagating deletions or settings of the original repo, it doesn't actually include ANY automatic propagation, therefore GitHub should use a different word for this kind of fork, something like "Crate child fork" or "Linked fork" or maybe a new word altogether

[0] http://www.freekb.net/Article?id=1263


GitHub coined this particular use of "fork", and it's always been about having an automatically managed relationship between the original repository and the new one. A copy without that automated connection is a clone.


No, it definitely predates GitHub. And git itself, by at least a full decade.

https://en.wikipedia.org/wiki/Fork_(software_development)#Et...


GitHub specifically invented the idea of "forking" as a social action on a forge site that allows you to create your own associated copy of a repository. This is related to but different from the broader meaning of "fork". "Fork" doesn't mean anything at the git level.


I'm not sure exactly the distinction that you're trying to make. I see GitHub's use of "fork" as a specific application of the broader meaning of "fork", not an invention of a new and distinct concept. Just as putting "wheels" onto a steam engine can produce a new type of vehicle but doesn't change the concept of "wheels", GitHub's use of "fork" doesn't fundamentally change the broader concept of "fork".


If any changes done to the parent repository propagate automatically to "forked" repositories without the explicit consent of the _owner_ of the fork then it does change the broader concept of fork, and to follow your analogy it would be like calling a caterpillar track a wheel.

If this is acceptable because the original version it's a private repository that is unrelated, what we are discussing is the meaning of the word itself.


I think we are in agreement. Because access to the "forked" repository was removed without the consent of the owner of the fork, it is inaccurate for GitHub to describe it as a "fork". For clarity, I would also describe the "owner" of the fork as the person who created the fork.


I see the fork feature useful since for example if you fork a project that is no longer maintained user can search in the forks and find that you are now maintaining it. I've found myself doing it a lot of times.

Regarding forking a private repository with a public repository, it's a corner case for sure. In my opinion it's best to forbid forks of private repositories at all, and forbid to make a repository that has forks private, than to create problems like the one of the user in this topic.



That's why I usually don't use the official "fork" feature, but clone and push the repository manually instead. I would like to keep the fork network connection on Github, but I don't want to see my fork deleted because of an error, malice or simply lack of knowledge.


It will only be deleted if the repo you fork from is a private repository. The documentation [1] covers the other scenarios, in all of which you keep your copy of the code (including when the public repository is made private later).

[1] https://docs.github.com/en/pull-requests/collaborating-with-...


> It will only be deleted if the repo you fork from is a private repository

This makes sense. Thank you for clarifying that important detail. It seems to be missing from the parts of the discussion I've read here.


No, it doesn't. It only makes sense until you stop and go "Wait, no, hold on a minute. Why would they delete the fork instead of simply severing the fork relation in their fork relations table?".


Consider:

You have access to many private company files. After you leave the company, the company is obligated to send you copies of all of the files because you may have linked to them. After all, you could have made personal copies of all of the files, so you should still retain access through links.


TIL, thanks! I probably confused this with the GitHub takedowns, when forks are removed as well (as it happened to the youtube-dl repo). I could imagine my manual clone not withstanding such takedown either, though.


Yeah for cases like that, keep a local copy, thankfully many people did


What if the original is made private and then deleted? Does your fork remain?


Yes, the docs linked above say:

"If a public repository is made private and then deleted, its public forks will continue to exist in a separate network."

https://docs.github.com/en/pull-requests/collaborating-with-...


IIRC GitHub will also delete the entire fork network for DMCA request even if your fork is not mentioned explicitly.


Same. Another reason is I don't like how Github inserts "Forked from ..." in the project name. If your "fork" becomes extremely divergent after a couple years (maybe you had a different vision for the project), you are still stuck with the "Fork from..." sub-header, which basically tells users that they should look at the original. I'm otherwise fine putting attribution in a README.


Right, the entire point of GitHub forks is to make it easier to upstream your local changes to the original repository. If you have no interest in doing that you shouldn't use a fork.


Which is pretty much the opposite of what fork used to mean before GitHub...


you know you can fork dead and abandoned projects right?


Sure, you can. It might not be a good idea in the long term.


“But the plans were on display…”

“On display? I eventually had to go down to the cellar to find them.”

“That’s the display department.”

“With a flashlight.”

“Ah, well, the lights had probably gone.”

“So had the stairs.”

“But look, you found the notice, didn’t you?”

“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”

― Douglas Adams, The Hitchhiker's Guide to the Galaxy


To clarify the appropriateness of this analogy:

This is unexpected behaviour from Github here which may (and has, by the anecdote of OP) cause permanent data loss. Documentation is not good enough, as users should not have been expected to have read the entire documentation.


I guess this is a question of who should have been given further information. For example, whoever at the organization deleted the repo would have been given a very clear warning screen including the number of forks that would be deleted by their action prior to them doing it.

On that note, an organization admin can _directly_ delete your private fork without even deleting the source repository if they want. GitHub's permission model is fairly direct that private forks you make through your membership in an organization are more the organization's property than the forker's.


> I guess this is a question of who should have been given further information. For example, whoever at the organization deleted the repo would have been given a very clear warning screen including the number of forks that would be deleted by their action prior to them doing it.

This is exactly how it works today already. If I try to delete a private repository people have forked, I see the following:

> We will also delete all 4 forks since this is a private repository.

Clicking on the delete button, again:

> Unexpected bad things will happen if you don’t read this!

> This will also delete all 4 forks since this is a private repository.

> [type name of repository]


Yes, that's what I said. It sounds like the parent poster is suggesting that the fork account should instead have that notice.


IMO it would be useful if non-obvious behavior like that were warned about when you fork the repo. I know I'd get burned by that. I keep a local mirror of everything though.


That's a fair point.


In my opinion in this case the fork shouldn't be allowed to be created at all. If this is the final effect it's better to inform the user that "no, we don't let you fork the repo". So he could have done it the normal way, clone the repo and push it with another remote, that would have not had this issue.

To this day I thought that the "fork" concept was only a relationship at the level of UI, but as I see it has a logic in it, that is the fork depends on the original repository even for permissions, and that to me is surprising!


I think you're misunderstanding the way that a lot of orgs use forks. Many orgs will have the team fork the repo under their own account so they get their own working space, and then they make PRs from their forks back to the origin. Before branch protections it was also the best way to manage write permission. This is a really common pattern and not allowing it would break how a lot of people use github.

If the org doesn't work this way, it can disable forking so that it's not allowed at all on the repo (or org-wide), like you said.


I don't think it's an unexpected at all given that the original repo was private, not just the fork. Secondly a GitHub "user" isn't really a user in the consumer sense. They're a developer, as a developer/professional you can be expected to consult the documentation of a tool you use so you understand default behavior.


Is it unexpected though? The repo was forked from an org by the person who was a member of that org.

I know this isn't common but I actually use a unique user for my company "myname-company"


If you have a Raspberry Pi wasting away in a drawer[0], I strongly recommend installing Gitea or Forgejo and mirroring all the repos you like (i.e. the ones you contribute(d) to and/or starred, not just on Github too!). You set it up once and it will sync in upstream changes as often as you like (default is daily)

0. Or a homelab, or a cheap 256MB VM, or a NAS that can run docker containers, or an old Chromebook: anything that can execute a Go binary, x64, arm6 or even mips


I wonder if there is an efficient way to do direct incremental git-to-S3 backups, or if you have to do this, run a Git mirror and do regular filesystem-level backups of it.


https://git-scm.com/docs/git-bundle creates a single file that contains everything in A not in B, so e.g. the delta between the state of the repository yesterday and the state today (for all the refs in it). You just need to produce the "rev list" to save.


a cron job ought to do this. Try something like Cloud Scheduler that can automate this for you.( I am not sure what the equivalent is in AWS)


My question is not how to run it, but what to run. If your scheduled task does a full clone every time to upload as ZIP to S3, it is massively inefficient. Even if you use something like Restic, because the Git pack file will have nothing in common with the previous one.


Clone each repo locally. Periodically do "git fetch -p" for each repo to update the local copy of upstream content. Run some periodic task like restic or rclone (depending on whether you want point in time snapshots or just a mirror of latest state) to mirror these local repos into your S3 bucket.

The local clones should evolve incrementally due to "git fetch", and then the restic or rclone task should figure out how to make incremental updates to the S3 content.


But that's... exactly what I described and asked how to avoid...


I had trouble parsing your earliest comment, so I only tried to address the incremental backup concern. I may not have understood the conversation, but it seemed like you claimed that a filesystem level backup of a clone was not going to produce incremental backup IO in practice.

A periodic fetch into a persistent cloned repo will be incremental unless the upstream is doing something crazy with frequent branch deletions and repacks. In practice, most upstream repos I encounter behave relatively monotonically. They accumulate new commits and branch/tag heads but do not often create garbage or need repacking.

A periodic backup of the cloned repo will also be incremental if using an appropriate tool like restic or rclone-copy. Also, since the clone only changes during the fetch, you can serialize these in one periodic job and be confident that you are making a consistent snapshot of the repo.

The advantage of this approach is its simplicity. It is easy to reason about and easy to work with the backups to restore a repo without having to learn about other tools. It's the kind of thing I could feel comfortable setting up and running for years on end with little supervision.

A more sophisticated approach that integrates with git hooks, e.g. to do event-driven rather than periodic backup, is plausible but I think could quickly get in the way of itself. And if working with a hosted upstream, you would need to integrate with their proprietary hooks, e.g. GitHub actions, and deal with other restrictions of the hosting environment. Such a solution likely brings new failure modes and may not be a worthwhile tradeoff...


Again, this requires you to have a persistent clone on a filesystem. I specifically wonder if we can do (and I quote) "direct incremental git-to-S3 backups", and you keep replying "it's easy, do it indirectly with a persistent cloned repo".

I don't understand where you are stuck, tbh.

yencabulator has provided a good tip I think, as you could store the previous set of refs and use that to build an incremental Git bundle (one with only the objects that were not in the previous bundle). I don't know if you can do that with the existing Git client though.


I'll look for a tool that mirrors all my repositories (i.e. only forks). If I can't find one, I'll write it myself.


This is build in to Gitea. I mirror multiple github repos to my private gitea instance.



I wrote a little tool to mirror my repositories to my gitea instance. It has been months to potentially a year or two since I ran this, but it does what you’re asking. You can mirror repositories of users and repos users have starred. It definitely needs some love.

https://github.com/jasonraimondi/deno-mirror-to-gitea


I wrote a similar one I called Forgery that’s in a repo of the same name in my github account (which can be found in my user profile… just created this account and didn’t want to post a link in my first post, thought it might get automodded).

Very similar to yours, but also does forks, which I’m not sure yours does by a quick glance at your readme. Although, mine doesn’t automatically mirror from another forge, just clones everything locally. I’ll have to add a TODO to add mirroring.

Please have a look at how I handle them and consider adding a link to my project in your “similar tools” section, and I’ll do the same for you!


If I remember correctly, I believe that mine will do forks also, they are just included in in your repositories.

I just looked over your project, and it seems pretty cool. A little bit different than mine, since mine is specifically github to a gitea mirror repository. I don't mind adding a link to your project in mine.

Thanks for sharing!


I have to assume that the original repository was private when you forked it. If it was public, and then made private, then this should not happen.

If the original code was "open source", then why exactly was it in a private repository? Putting "MIT licensed open source" into a private repository is not publishing that source code for the world to use.

It sounds like nothing weird happened here other than this company thinking a private repository was "publishing it as open source".


Open source doesn't necessary mean that the code has to be share publicly on the internet, let alone on a GitHub repo! This is a common misconception. You can of course also decide to sell an open source software, that is have other pay you to obtain the software source code with an open source license (of course this person then can share the code, or even sell it to others legally, so it's not commonly done).

I can put a MIT or even a GPL license in a private repository that I have at my company. The meaning is that I don't release the source code, tough if one of my employees wants to take it and use it he can, and he can also decide to share it with other people, or put it on a public repo.

Why I don't want to put the repo public? Maybe I'm just lazy, I don't see too much value in the code, I don't want to write documentation, tests, whatever, I don't consider it of enough value, whatever, still I don't have problems with people that have access to the code that they use it, and share it if they want.


I have no misconception of what open source is, if you want to put your foot near this foot gun intentionally by paying GitHub to host code you've open sourced (while telling no one outside your org that you've open sourced it) that's fine.

A common misconception is that publishing a public repo comes with obligations to add tests, documentation and whatever.


> If the original code was "open source", then why exactly was it in a private repository?

It's been too many years to remember the exact reasons, but this was not the only repo in the project. This was the "working directory" that had all kinds of random stuff that a data science project might accrue over time. Later in the project we published 3 repos which were more "cleaned up" to be potentially useful to outsiders (I use scare quotes around "cleaned up" because the codebases are still a mess, sorry).

Anyway, 3 of the 4 repos appear to still be public:

https://github.com/futurice/health-visualizations https://github.com/futurice/health-visualizations-front https://github.com/futurice/laaketutka-prereqs


It would have had to be privately forked. If you change your repository from public to private, GitHub detaches the forks and leaves them public, so a repo owner can't just delete someone's public fork.


I have a number of git repos that the original developers deleted - because I sync’d them to a usb stick with gitea. I think that is how you have to do it - never entrust a service, especially a free one, with your only copy of anything you value.

If the YouTube algorithm nukes your account and all your videos, you should be ready to upload them to a new account. Same with anything else digital.

My current is standard is one copy in AWS S3 which is super reliable but too pricy for daily use, and one copy in Cloudflare R2 or Backblaze B2 which might or might not be reliable (time will tell) but is hella cheap for daily use.


> because I sync’d them to a usb stick with gitea

Just a tip: no need to use gitea if you want to replicate a git repository to somewhere else on disk/other disk.

Just do something like this:

    mkdir /media/run/usb-drive/my-backup-repo
    (cd /media/run/usb-drive/my-backup-repo && git init)
    git remote add backup /media/run/usb-drive/my-backup-repo
    git push backup master
And now you have a new repository at /media/run/usb-drive/my-backup-repo with a master branch :) It's just a normal git repository, that you also can push to over just the filesystem


Even better with

    cd /media/run/usb-drive/my-backup-repo && git init --bare
Bare repositories don't have a working directory. You can still git clone / git pull from them to get the contents. You can also git push to them without clobbering any "local changes" (there aren't any).

More detail here:

https://www.atlassian.com/git/tutorials/setting-up-a-reposit...


Yeah, better in terms of saving space, but I think it confuses some people, hence I didn't use it in my above example. Previous time I recommended a co-worker to use the `push to a directory` way of copying a git repository, I made them create a bare repository, and they ended up going into the directory to verify it worked and not seeing what they expected. Cue me having to explain the difference between a normal repository and a bare one. It also confused them into thinking that a bare repository isn't just another git repository but a "special" one you can sync to, while the normal one you couldn't.

So in the end, simple is simple :) Unless you're creating remote repositories at scale, you probably won't notice a difference in storage usage.


I hear all that, but --bare is necessary in this case because git (by default) won't let you push to a non-bare filesystem branch:

    ~/temp/a:master  $ git push backup
    Enumerating objects: 3, done.
    Counting objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 212 bytes | 212.00 KiB/s, done.
    Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
    remote: error: refusing to update checked out branch: refs/heads/master
    remote: error: By default, updating the current branch in a non-bare repository
    remote: is denied, because it will make the index and work tree inconsistent
    remote: with what you pushed, and will require 'git reset --hard' to match
    remote: the work tree to HEAD.
    ...
    To ../b
    ! [remote rejected] master -> master (branch is currently checked out)
    error: failed to push some refs to '../b'


  git clone --mirror
  git clone --bare
  git push --mirror
  git push --all
"Is `git push --mirror` sufficient for backing up my repository?" https://stackoverflow.com/questions/3333102/is-git-push-mirr... :

> So it's usually best to use --mirror for one time copies, and just use normal push (maybe with --all) for normal uses.

git push: https://git-scm.com/docs/git-push

git clone: https://git-scm.com/docs/git-clone


You can even take it a step further and have a push to origin update your backup as well.

https://stackoverflow.com/a/14290145


> In recent versions of Git you can add multiple pushurls for a given remote

Woah, that's really cool, didn't know about that. This is really useful! Thanks for sharing that.


Gitea has a cron task that pulls-in changes on an ongoing basis.

If a snapshot suffices, a once off "git push" or "git clone" works (but that's not too far off from downloading a tar ball, is it?). If you want to have a up-to-date local copies of multiple repos, a SQLite-backed Gitea instance is the simplest solution.

An added bonus to using Gitea is flexibility in mirroring LFS objects, which can be sent to S3 or minio


> a SQLite-backed Gitea instance is the simplest solution

Agree to disagree :)

This seems like the simplest solution:

    git remote set-url --add --push origin [email protected]:my-user/my-repo.git
    git remote set-url --add --push origin /backups/repos/my-user/my-repo.git
Now when you push to origin, it pushes to your local backup as well, everything up to date, no external sosftware at all :)

Thanks dabber for sharing this trick (https://news.ycombinator.com/item?id=34603174)


This does not include the cron jobs to pull in changes others make daily/weekly/every N hours. I mentioned that Gitea is superior under very specific conditions (where one wants to have latest version available locally).


I did not realise you could set the remote's URL to a local filesystem path.

That opens some interesting possibilities.

Cheers


Have you tried running gitea? It's very light on resources, has good documentation, and also defualts to a main branch. It's also very easy to control where all the data is stored, and works well w/ sqlite.


Is the "main" branch an advantage? I guess only if >50 percent of your repos use "main" branch


[flagged]


That is an awful lot of contortions you are doing here, to seemingly justify a word change that has had well-cemented meaning within the tech community since it's inception.

We all know why this change exists, and why some people will attempt to persuade others of it's superiority. It is, however, just silly virtue signaling, and it's exhausting to hear and read.

It would require some very irrational and underdeveloped reasoning to assert this word has anything to do with oppression in 2023. There is no negative connotation, except in those who wish to perpetuate some weird sensation of altruism... ie. no one is safer or feels better simply because you choose to call it "main" rather than "master".


> no one is safer or feels better

Clearly some feel better.

I personally like the name “main” better.

But it is truly a pain in the neck that different pieces of software and even different distributions of the same software now disagree about the default.

I’ve got a handful of active projects that go together that differ on master/main because they were created by different softwares.

I’d prefer “hitler” if everyone could just agree to always pick that. GitHub are the big pushers of this culture change. If they succeed, I salute them.


There was zero disagreement about what to call the default branch in git prior to this linguistic crusade.

GitHub practically invented the exact situation you're experiencing now by changing to main for all new repos.

Git itself still defaults to master. Everything else is not the default and is the root cause for the uncertainty.


> It is, however, just silly virtue signaling, and it's exhausting to hear and read.

You should be aware that complaints about supposed virtue signaling are equally exhausting.


I'm willing to go out of my way to use a different word if it makes people feel better (the root of the master -> main transition). But this is the rare case where it benefits me by having to type two less characters whenever I refer to the branch! main is truly a win/win.


> I'm willing to go out of my way to use a different word if it makes people feel better

Which is just the thing, really. It makes no one feel better. It makes the privileged speaker feel better, with a false sense of virtue. It's a "look at how great I am" signal, nothing more.

No one is harmed or made to feel bad by using the word master. Sometimes the adults have to be present in the room, it seems.

Saving 2 characters is an equally silly excuse, but at least it has a realistic rationale. To that end, why stop at main - why not just 'm'? You can call it whatever you want in git.


It's because main is a goldilocks word for something like a default branch name. It isn't too long and isn't too short. It also isn't shorter in terms of syllables.

Good design :)


> main is a goldilocks word

I totally agree.

> Good design

Having one default instead of two is better design.


And it is _still_ causing completely avoidable pain, years after people started advocating for the change. A recent example I tripped over: https://github.com/brendanhay/amazonka/issues/815#issuecomme...


Yes, I've tried it (and actually run a personal instance myself), but I would never try to run an application meant as an webapp when I want to copy something from one filesystem to another, when git can do it already without any external programs.

Also, the `master` is just an example, it works for `main` as well, don't worry :) The created git repository on your usb-stick works like a regular git repository, you can use whatever branch names you want.

Btw, way to focus on the absolutely least interesting part of my comment, what I chose to name the branch...


The biggest reason to do this is thah it supports "mirror" repositories where it will keep your copy up to date, even using github keys to get at a private repo if you want.


Yes, and if you stop the process and start it again, gitea doesn't complain and picks up right up where it left off. Ditto if you lose internet connectivity. It's a well-designed piece of software. I considered using it as a BaaS and am actually thinking again of using it as one.


You can change the default default branch on GitHub via <https://github.com/settings/repositories>.

I believe new GitHub accounts now have that set to main.


The same works also over SSH using user@host:path for the remote - you don't need a deamon running on your server to push to it.


For this I have a NAS with a pretty basic script that runs nightly to clone any new repos I have and update those already backed up. They get organised into a directory structure mirroring that of Github: `./github.com/user/repo`

If of any use to anyone else: https://gist.github.com/wjdp/a20cb15f76b651124b3b27cde06d121...


> If the YouTube algorithm nukes your account and all your videos, you should be ready to upload them to a new account. Same with anything else digital.

Do you know if this is a common occurrence?

Also, I'm only a YouTube viewer and am not familiar with all the creator tools, problems, communities, etc. But would a creator really re-upload all their back-catalog if deleted? Just to try to get back to views and things?


I remember someone posting an agonized screed, some time ago, about YT deleting their channel, and all the videos.

Apparently, they had not kept the source/rendered originals of the videos, so it actually clobbered their business.

I am a scarred, limping old coot, and have learned [the hard way] that backups are goooood.


> I am a scarred, limping old coot

That is a delightfully evocative phrase


A few weeks ago, youtube changed their swearword policy. A creator I follow basically had to delete half their channel or risk termination.


They're not banning channels based on swearwords (yet, anyway). They are demonetizing videos with swearing - in the first bit, if too much, maybe other rules, but nobody is getting banned from saying 'shit'.


If you don't want monetization, is swearing in your videos to avoid your subscribers having to watch ads a viable strategy?

(To get around this: https://www.forbes.com/sites/johnkoetsier/2020/11/18/youtube...)


Huh, that probably explains... I was in a youtube rabbit hole right around then when some videos suddenly wouldn't load, turned out that I might've been the final viewer of the (small) channel that had had been banned at that moment. I was wondering what the chances were.

edit: Seems like it. The channel[1] name probably raised some new flag, and Google did its thing. Seems fair, it's not like a reasonable moderator would know of a concept of a second chance or anything.

[1] https://web.archive.org/web/20181123103308/https://www.youtu... https://web.archive.org/web/20220624154617/https://www.youtu...

// Ah, that channel was a pretty interesting part of the rabbithole of net culture-related parody too - rare to see collaboration like that


Battlestar Galactica did it right - https://www.youtube.com/watch?v=rrYdQnz8vJg


There were a whole bunch of artist and genre specific mixes I used to listen to on yt that are gone now. The uploaders accounts have all been nuked too. The sad thing is I can listen to it all on Spotify but it's not the same.. the creator's did not insignificant work to mix the songs together.


Maybe not on YouTube, but the gunTubers are having issues with YouTube changing their interpretation of the rules and instantly issuing 3 strikes against them for rule violations. And so, it'd be good to have a back catalog to upload to a different service to keep that older material available.


Wonder if any data hoarders have made scripts to clone every site you have started on GitHub?


I'm reasonably certain that if YouTube deletes your account uploading them to a new account is expressly forbidden.


I’ve read having a secondary test account to post videos under to pass YouTube scans before posting to the real account helps minimize issues.


How exactly can they stop you?


What do you mean? They have extensive content ID/checks.


Post it somewhere else. Youtube is largely a waste of time, unless you're some right-wing idiot scamming money out of teenagers.


I completely understand your frustration. It's a dick move to delete your content without (at least) giving you a chance to archive that work.

At the same time this situation points up an important issue: if you don't own/control the infrastructure where your data lives, you don't own that data. Full stop.

If you host your data "in the cloud" (i.e., on someone else's servers) then you don't own that data, or at least not any copies stored there).

I'm not advocating for any specific action/solution in this comment (see my comment history for more about centralization vs. decentralization and "the cloud"), but the above is an important consideration, especially WRT long-term storage of your data.


It's simple! Maintain the control and the capability to retrieve your important data at all times. The internet is a wild place and everything you don't save could potentially be gone forever.


>It's simple! Maintain the control and the capability to retrieve your important data at all times.

An excellent point, but I'd go further and say that one should maintain multiple copies of important data, with at least one of those on hardware/infrastructure you control and have physical access to.


> At the same time this situation points up an important issue: if you don't own/control the infrastructure where your data lives, you don't own that data. Full stop.

We've known about this for many years!! And yet we for many varied reasons choose to make use of services anyway!! That doesn't mean we shouldn't get to complain about those services and giving people shit for that is really weird!!


This is an endless rabbit hole though, unless and until, I guess, this gets regulated into a law that makes cloud service providers accountable for the data stored under the user accounts they provide, or something along those lines. Until then, you can (rightfully!) complain about one thing, but then the next feature of the next service you use may again have similar issues.


>We've known about this for many years!! And yet we for many varied reasons choose to make use of services anyway!! That doesn't mean we shouldn't get to complain about those services and giving people shit for that is really weird!!

I don't disagree at all.

I just find it a little surprising that on a site where "not your keys, not your coins" is accepted wisdom, that "not your storage, not your data" isn't as well accepted.

That said, I am biased and have an agenda:

1. The centralization of network resources is a recipe for disaster;

2. There are many factors which have pushed us toward more centralization, and most of those factors (asymmetric bandwidth on consumer internet links, abusive terms of service, e.g., port blocking/traffic throttling, crappy consumer networking gear, etc., etc., etc) rarely get addressed;

3. The issues in (2) create perverse incentives for commercial entities to further abuse their "customers" (for "free" services that should read "product");

4. Those perverse incentives have morphed outside of paid and "free" SaaS and subscription tech services, encouraging manufacturers of all manner of products (cars, appliances, computers, communication devices and a raft of other products to employ these abusive, rent-seeking tactics as well;

5. Resolving the issues detailed in (2) (as well as those not detailed) could enable both libre and commercial self-hosting products to become a viable, profitable industry, both for products and support services. Thus enabling us (broadly, humans who use the global internet) to actually own and control our data, PII and privacy;

6. Solutions are plentiful, but the perverse incentives cut across the entire OSI stack and beyond, making the reversal of such incentives complex and difficult, especially because the hoi polloi either don't know or have been convinced that they shouldn't care about ownership (of physical products like phones, cars and appliances) of their data and PII. I don't have a comprehensive set of solutions, but creating competition (municipal last-mile broadband, interoperability requirements, etc.) and providing consumers with the tools they need to decide for themselves (symmetric bandwidth on internet links, "dumb" internet pipes, non-abusive TOS, etc.) how they should host/manage/control their data and possessions will be important steps forward in reversing such incentives.

I rant about this every so often (this being my latest offering), and while it's not specific to Github or how their TOS treats various data storage offerings (repos), it's absolutely an example of how these perverse incentives harm and abuse consumers. In my view, that's wrong.

Edit: Clarified my prose.


OP specifically asked GitHub to keep the code private!

It's silly to be upset about that, after refusing to make any backups or distribute any copies of this "publicly licensed" software.


I'm not sure why people are defending Github on this issue, what if the original repo was a template or something, and your thousands of lines of code is gone because the original template repo removed you as a contributor. If I copy something, I expect where I copied it from to have exactly zero bearing on what happens to my copy. If they have a problem they can serve legal documents, giving everyone time to figure something out with zero data loss.


Guess I’ll be cloning repos locally and then pushing instead of forking.


Apparently this removes you ability to open Pull Requests.


This only applies to private repositories. Do not put templates up as private repositories, and do not use forks when consuming templates. That is not what forks are for.


It's irrelevant what they are for. Reality means that things like this get misused but deleting data without warning is still not ok.


I agree completely.

But at the same time - who doesn't have local copies of anything they care about? What are they thinking!?


You’d be surprised at how many use GirHub as their remote code backup platform. Having private file system backups is a question of culture, and a lot don’t have it.


If no one has a copy of the software, does it really exist?


Is futurice/how-to-get-healthy a public repo? It's not visible on https://github.com/orgs/futurice/repositories

If you fork a private repo and they remove you from the collaborators list, it's reasonable that your fork would be removed.


Well, it's not reasonable to me, but we'll just have to respectfully disagree on this.


It probably seems more reasonable to the people who own the repo.


I don't think the people who removed my access had any intention of deleting private repos. It's just somebody cleaning up ex-employee access from GitHub.


As a non employee, how would I access this repo?


The most “expected behavior” here would be to remove the fork relationship between the two repos and leave the copied repo as a plain private repo. I don’t think it is reasonable to just delete a private repo by default.


I have an interesting but unrelated story about files going away in Github.

Remember when Windows Research source code that was used for teaching in university, and was leaked in the late-aughts, way before Microsoft purchased Github?

I had this code in one private repository called "ms" for more than a decade. It didn't have Git history or anything, it was just some random files, plus the leaked sources.

I totally forgot about it until a last year ago, when I checked, and the code is entirely gone.

I'm now more careful about what I put in private repos. In fact I don't have anything private there anymore...


We ought to treat GitHub's "fork" more like "branch on my account".

If one wants to create an actual fork, then clone the repository locally, change the upstream and push to repo on your account.


Never use the fork feature on private repos. Instead, clone the repo locally, create a fresh GitHub repo, and push your local clone manually to that. Doing so will protect you from this attack.


It'd be nice to be able to manually specify an upstream for a repo, that's the main benefit of forking in the UI.


Does that limit submitting PRs to the original repo somehow?


Apparently yes. You can't submit PRs to a repo outside of the "fork network". (IIUC these all share a single Git repo under the hood)


you need to make a fork for that, however you could add that one as a second upstream repo


> Your private repository baobabKoodaa/laaketutka-scripts (forked from futurice/how-to-get-healthy) has been deleted because you are no longer a collaborator on futurice/how-to-get-healthy.

> and now it's gone... why?

Because it was a private not a public repo.


so what?

Private doesn't imply (common sense) that the original repository has power over any fork.


It does, otherwise github just wouldn't allow forking private repositories. If they did allow that, and retained no control over the forked copy, now you can ride a coach and horses through the access control to a private repo by simply forking it when you have access. My guess is that forking a private repository is a feature github intended to be used where employees or contractors of an enterprise want to fork their employer's repository as part of their development activities for that employer. Github sees those forks as transitively controlled under the organization's access policies.


> If they did allow that, and retained no control over the forked copy, now you can ride a coach and horses through the access control to a private repo by simply forking it when you have access.

...which you can still trivially do if you use git to make the copy. And then your github repo will be immune to this kind of deletion.

So common sense says to me this should act similarly.


> ...which you can still trivially do if you use git to make the copy.

If you want to steal code from your former employer it's your business and your legal jeopardy. GitHub can't do anything about that. They can remove access to the copies they're storing for you, though.

GitHub has a weird model where they encourage using the same account for personal and professional work, which causes this kind of ambiguity. From their perspective, there isn't a real difference between forking a private repo and making a private copy of a shared Google doc in your work account.


It's not about wanting to steal anything. It's that making a copy is trivial, so there's no point in worrying about how you can "you can ride a coach and horses through the access control".

Don't worry about the barn door when there is no side on the barn.


My point is that GitHub does not personally want to have a hand on the reins, or whatever metaphor we're doing here. A private fork is a copy of the original code that GitHub is holding onto. A clone on your personal computer or server or printed out on paper tape is a copy that you are holding on to.


" If they did allow that, and retained no control over the forked copy, now you can ride a coach and horses through the access control to a private repo by simply forking it when you have access. "

Actually that's not really true, since your access to the original repository could still be revoked, and you are left with what you got.

Further, see sibling comment.

"My guess is that forking a private repository is a feature github intended to be used where employees or contractors of an enterprise want to fork their employer's repository as part of their development activities for that employer."

what you describe is "internal visibility"

https://docs.github.com/en/repositories/creating-and-managin...


That is the exact meaning, which does seem to be common sense. You can fork from a public, private or internal repo. Public is public and the fork won't get deleted when the repo removes you. Private deletes your fork, since it isn't your repo. Internal requires gated access.


for me the exact meaning for private is "available for a selected audience"


Exactly. That implies that when a person is removed from the selected audience, they lose access to the private code.


to my private repository, but not their own private repository.

like it happens when a public repo goes private: I don't loose access to my fork of the repo but access to the original repo


You are trying to construct a scenario, where you have the ability to elevate your own rights to somebody else's repo's contents. Name a computer system that intentionally allows people to do that. That similar to demanding that you can still send emails on a terminated email account from your prior employer.

If you want to continue your access to that private repo's source code, you now need to speak to them. They own it, not you.


I am not trying to construct anything. I just described what happens and what I expect.

"where you have the ability to elevate your own rights to somebody else's repo's contents."

This is not an accurate description. There are two repositories: the original repository, and the fork. Nowhere I want to to elevate my rights regarding the fork to the original repository.

" Name a computer system that intentionally allows people to do that."

If somebody sends me a word document per Email, I can edit it without someone else being able to delete the modified word document.


You keep trying to claim that a different person's private repo that you have forked has somehow given you ownership over the contained information. That's just not the case. Forking their code doesn't make it yours.


actually that is the point of a fork onto my account


GitHub doesn't allow you to make a public fork of a private repo, either. When you make a fork of a private repo, the resulting repo is constrained to have no broader access than the original.


"GitHub doesn't allow you to make a public fork of a private repo, either. "

where did I claim something like that?


I'm saying that GitHub consistently does not allow you to control access to a private fork. The original owners retain control.


how does this consistency manifest itself?

what about when the original owner publishes his repository? he doesn't control the visibility of the fork, does he?


It's with noting that the person doing the Spring Cleaning would have been warned that they were about to delete your forks. They could have contacted you to mention this

I don't think there's actually an option not to delete them, which would have made sense in this case.


Try to treat YouTube, GitHub, and any other service the way you would a failing hard drive. If it's not backed up, it doesn't exist.


Why a mail saying that something was deleted instead of a warning that something will be deleted??


If you worked at a company and had a fork of a private repo that company maintained of some software... do you want employees who have left to have their access to the repos removed? or an email that says that it will be removed?


> had a fork of a private repo

This is the real anti-feature. You should only be able to fork a private repo using an account that is directly managed by the organization that owns the repo. That way when you revoke access to the user they automatically lose access to the fork.

It's super weird that it's common to use the same account for work and non-work stuff on GitHub (myself included).


That gets into other aspects of how people are using GitHub.

Using that model, your work on private repos wouldn't show up as "your" GitHub activity history.

Your work on external projects sanctioned by your employer (and thus using your employer's managed account) wouldn't be associated with you when you leave. For example, if you were at VMWare and contributed to the Spring project - if you left VMWare the "I did core work on Spring - its right there in a public repo" would not be associated with the account that you're saying is you.

Yes, it is weird to be mixing work accounts and personal accounts (and the mess I have had with email when my gmail account was associated with a former employer).

There's tradeoffs no matter which way that you do this... and people appear to prefer the set of "using a single account on GitHub for work and personal" and then having the follow on implications of that being that you may lose access to internal repos when you leave... which you would in either case, just its a bit more surprising when its your "personal" account.


Yes, because GitHub doesn't support organization users (maybe you have to pay?) so the thing is that you use your personal account. That is fine, since the real account is the membership to the organization. The only thing is the stuff of private repo: if there is this unintuitive behaviour it shouldn't be allowed to fork private repositories at all.

Or the fork should not be deleted, but it should be made in a way that it's equivalent to a pull and then push to another repo, that is if you loose access to the original private repo you can still see your code, but you can no longer pull from the upstream private repo. I think this is a problem on how data/permissions is represented at low level in a repo, so if this is the case and cannot be fixed they shouldn't allow private repo fork at all.


What prevents them from having the code locally?


Same nothing that prevents a person from copying it onto a USB stick before they leave.

However, that's not GitHub's problem. GitHub is "told" (by removing the person's access from the repo) that they shouldn't have access to the repository or its code. It is within GitHub's ability to remove access to their fork and remove it.


> What prevents them from having the code locally?

Likely lack of initial intent and effort.

Just like having firewalls and the most recent security updates doesn't mean that you are inpenetrable. After all, the most common attack vector is the good old phishing email tactics and other ways of social engineering someone anyway. However, it doesn't mean that you shouldn't have a firewall (or other ways of defending your server) or that you should neglect timely security updates.

There are layers to this. Sure, someone who had an intent to do it from the start could've cloned the repo locally ahead of time. But for a lot of people it could be a crime of opportunity. Or it could also be that their account with access to the fork on github gets compromised later and a malicious third party got access to it.

My point is, having this restriction on forks won't prevent a determined attacker from getting it preserved locally ahead of time, you are correct. But it will prevent a lot of other unwanted scenarios that could result from not having the current restriction on forks of repos that went private.


I'd prefer it if the repo was returned to the company, rather than removed.


That gets complicated. It's in (for example) my namespace. So shagie/foo.git

And now you want to return it to BigOrg/foo.git - where does it go in that namespace of BigOrg? What are the permissions for who should have access to that? Who gets billed with a scheduled action runs and chews up a lot of CI credits as a parting gift?

The workflows for anything other than "delete it" gets GitHub in a bit of a mess.


one of many scenarios in which it is probably(!) desirable to revoke access.

What about other scenarios?


GitHub also blocks their built-in fork UI functionality when the original repo's owner uses the site's "block" feature against you, which doesn't align with my expectations. Ironically, anyone affected by this poorly implemented blocking behavior (and thereby copying the full repo instead of forking) would be protected from this bug by the OP.


According to the documentation I remember reading, blocking someone would delete their forks of your repos (including public ones) back then. I believe this is no longer the case.


This is because forking is a social feature. If the origin repo is unwilling to accept pull requests from you, there's no reason to let you fork.


Anyone know why github search seems to be broken? While logged in, if I query https://github.com/yt-dlp/yt-dlp/search?q=exec_cmd no results are returned for any category. But "exec_cmd" exists in a few places in that repo, for example: https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/options....

On another note... It's a honest question to ask, does github state anywhere a policy on shadowbanning (treating a repo differently in any way)? It's becoming a law of tech nature that any org big enough starts to do it. Some variation of a warrantcanary (preferably in the form of a contract with the users "we have not done X Y Z") would be cool.


Totally expected from the minute one after Microsoft buy the vault, earning full control over the software and the people's access to the platform. I will not be surprised at all if some of this closed projects would appear included into closed software under new authors after a year.


Is this behavior new? It has been like that for as long as I can remember, certainly before Microsoft bought GitHub.


It is not. I also remember this behavior around forking private repos from before the acquisition. But then again, this doesn't lend itself to a conveniently pithy comment about Microsoft ruining everything.


Pretty bizarre behaviour given how easy it is to get around this. Instead of forking, just

git remote set-url origin <new-remote-repo>

The new remote can even still be on GitHub. So what’s the point? Anyone relying on this to retroactively remove access to source code has a false sense of security.


It's not your code or data, it's your previous employers IP and they determine who has access and who doesn't and what licenses do or don't apply. Forks have worked like this for awhile and I consider it a feature, especially in cases like this.


That's not how MIT license works


So you're suggesting GitHub should determine the total "average" license of a private repo and determine if your fork is indeed valid or not, before revoking access.


> So you're suggesting GitHub should determine the total "average" license of a private repo and determine if your fork is indeed valid or not, before revoking access.

No, I'm not suggesting that. In fact, I didn't say anything about what GitHub should or shouldn't do. My comment related to how licenses work, not how GitHub works or should work.

In particular, I was responding to this comment:

> It's not your code or data, it's your previous employers IP and they determine who has access and who doesn't and what licenses do or don't apply.

The claim in the above comment is incorrect. I stated that the claim is incorrect. That is unrelated to the issue of how GitHub should deal with private repo forks.


You mentioned in another post that the company only released a cleaned up version publicly. That cleaned up code which they published is clearly and unambiguously under the MIT license. Any other modifications that were made and not published (including any you made in your fork as an employee) are not automatically licensed as MIT. Your employer holds the copyright to that. They might be fine with those internal changes being released as MIT or they might not, but it is up to them.


If the private repo had the MIT license in it, then it was licensed with the MIT license, regardless of how widely or publicly the repo was distributed.


It isn't that clear-cut. A license is a legal grant of rights from the copyright holder to the licensee. A license file is just documentation. A repo can have different parts that are covered by difference licenses and there are different ways to mark the licenses, whether in the text of each file itself, or other top-level files that document the status. It is also fine for internal working copies to not have all their licensing documentation perfectly applied the instant a file is created. In particular, in many companies, individual software developers don't have authority to license software on behalf of the company who owns the copyright, and so any markings they place in the repo are just tentative drafts pending legal review. So in the context of a private working copy, the existence of a license file doesn't have a ton of legal weight. What matters is when the copyright holder chooses to grant a license, via whomever the company gives authority to do so.

Once the organization publishes the software to others, whatever license documentation they include with it is binding, unless other issues trump that (like them not holding the copyright to begin with).


That's not true. I can go put the MIT license text in every folder on my computer and it won't actually apply the license to anything.


Ok, it's not true in the general case, but it's true in this specific case. The project was intended to be an open source project from the beginning; we were intending to open source everything.


Instead, you might just want to reach out and ask for access or a zip file from the owners


The MIT license applies to the code that was licensed under the MIT license.

The MIT license does not necessarily extend to other contributions to the project that haven't also been specifically MIT licensed too. Those contributions may be under any (or no) license.


No, but that's how private repos work. It's really hard to open source code from a repo no one has access to.


Self hosting something like Gitea and mirroring to that is a good solution to this.


This is how GitHub has always worked, and it's been documented the entire time.

Forks are not just copies of other repos which go on to get their own history (or lack thereof) from that point on. They are linked to the parent repo, and share its visibility. If the parent repo goes private, so does the fork, for example.

There's a parent-child relationship that GitHub creates when you fork a repo using the "Fork" button in the UI. That is how this relationship is created.

This has always been the case with the "Fork" button on github.com. This should not have surprised you.


"If the parent repo goes private, so does the fork, for example."

not true

"GitHub will detach public forks of the public repository and put them into a new network. Public forks are not made private."


GitHub probably does this to protect itself from potential copyright claims by the copyright owner. In most cases, copyright owners will want Github to stop distributing copies of their code to people whose access they revoked. They won't care whether the copies are in a fork or not. If Github didn't revoke access to the code, including in forked repos, then the copyright owners could send DMCA takedown notices and potentially sue Github. Processing those takes time and money, so they do the easier and cheaper thing.


I wonder if this works the other way around. - create new repo in your personal GitHub acct. - fork it to company repo. - leave company years later - revoke access to company codebase. :)


Seems like it's working as intended. The code is MIT-licensed, but github doesn't know that. It only knows that you were allowed access to a private repo, and that access was revoked. This is supposed to save you the trouble of going through all of your forks and ensuring that you aren't retaining any code from an organization you left. Personally I make new github accounts for every new org I join, but that's just me.


I bet github is used as much or more for proprietary corporate development as it is used for open source development. This is the right choice for paying customers.


Given the nature of super forks in GitHub, if you are no longer a contributor, but keep your fork, you would be able to see the commits that happen in the original repo and all its forks.

There used to be a way to bypass the deletion, which was to clone into a organization. That way, GitHub would not delete your fork. I am not sure if it is still the case.

I reported all this to GitHub a few years ago, but they said it was a non-issue .


> That was an MIT-licensed open source project

I wonder if perhaps they made it private before deleting it. It would make sense perhaps for GitHub to do this if the original was private and you lost access to it (though.. you should still somehow retain downstream work?) - and I can easily imagine that coming with a 'checking private status at time of delete not time of fork' bug.


You can send the code to the Software Heritage:

https://archive.softwareheritage.org/save/

Perhaps there's a userscript to send everything that you fork there.

Ideally, federated forges will become more widely used: then a fork would mean copying the code to a forge of your choosing and there is less risk of losing code.


Your fork wasn't public?


Correct.


Had this happen to us before. Tradingview's JavaScript SDK is in an invite-only private repository in GitHub. We forked it, made our specific changes, then lost it once the Tradingview team removed our access after weeks of them trying to sell us stuff and not closing any sales from us.

The private fork we had, with our numerous changes, all gone.


I'm getting a sense like something is not right at GitHub, maybe it's just a coincidence but just in the last week I saw GitHub breaking their CI (causing many runs to fail due to a billing bug), then the checksum tarball issue and now this.


This is how private forks have worked at GitHub forever.


This destroys the very basic idea of opensource. That the first developer on a project can hand off to someone else to take over the project, and have the project continue past their involvement.

Now if the first developer deletes their account or repo everything dies.


This does not affect public repositories, so open source code should be fine.


> Turns out I don't even have a local copy of it anymore

Looks like you'll never do that again


Do you have a local clone? If yes, just populate a new remote with the local clone. If not, always a good idea to keep an updated clone of every remote repo. That's how git is supposed to work. The latest pull is the backup.


Recently setup Forgejo and it's been working out great as a self-hosted GitHub alternative. Once we have cross-domain pull requests and comments (wip) these alternatives can hopefully really take off.


Anything not on your machine - or requiring phoning home - is not yours.

We saw this with the Heroku being discussed yesterday, we've seen it with Github before. Want to keep it? Store it locally.


Same thing happened to me, it sucks. All the development had been in my fork (not just my own work, also my collaborators;) and now it's just gone.

Github should change this policy.


This is why I never fork a repo from github. I checkout my own branch (like git is supposed to be) and create a repo on github and most importantly, another server.


You deserve it. This is what you get for using proprietary software and not hosting your own gitea/gitlab. The repository was never yours.


You don't think "you deserve it" is a little harsh?


> We published the source code for everyone to use,

Published it where?

If it's published, how did you lose access to the software, even if the fork is gone?


This is the typical boneheaded, executive-mandated, policy. Once they get enough backlash they'll probably backtrack.


Sometimes this can also happen with public forks too. I learned this the hard way when youtube-dl was temporarily taken down.


Thanks for the heads up, this is pretty nuts.


I'm sorry to hear that whoever trained you on using Github didn't explain this to you.


Looks like a gitlab.com backup of everything is mandatory.


Every day is a new joy with GitHub, these days.


Does it go into GitHub Desktop and remove it from your computer as well?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: