Tell HN: GitHub will delete your private repo if you lose access to the original

tux3 · on Jan 31, 2023

I think this is under the assumption of "employee has a private fork of the company repo, then leaves, employee should not keep the fork"

So when "removed as a collaborator" which apparently includes the original being deleted, you lose access to the main repo and all forks, even yours. As if leaving a company.

evouga · on Jan 31, 2023

This doesn’t make any sense to me.

My private forks are *mine* and I most certainly do not want GitHub guessing at whether and when to permanently delete them without my consent.

Companies of course have the right to manage access to their proprietary source code, for example by only giving access to corporate accounts under their control and reclaiming those accounts when an employee leaves.

vetrom · on Feb 1, 2023

Any (quite relevant actually) discussion of SCM policy and management aside, you've got to remember: There is no 'cloud', it's just somebody else's computer.

naikrovek · on Jan 31, 2023

> My private forks are mine

Not if you create them using the "Fork" button in the UI.

Since this behavior has yet again surprised many people, here is the documentation: https://docs.github.com/en/pull-requests/collaborating-with-...

jefftk · on Feb 1, 2023

Which also shows the right way to delete a private repo if you want people to be able to keep their forks:

If a private repository is made public, each of its private forks is turned into a standalone private repository and becomes the upstream of its own new repository network. Private forks are never automatically made public because they could contain sensitive commits that shouldn't be exposed publicly.

If a private repository is made public and then deleted, its private forks will continue to exist as standalone private repositories in separate networks.

But I agree this is confusing.

kadoban · on Feb 1, 2023

The right way is to make it public first? That's insanity. Making a repo public just to delete it would be a huge information leak even if it was short in duration.

Ennea · on Feb 1, 2023

Just force push something very empty to it before making it public. One more step, yay..

account42 · on Feb 1, 2023

So they did think about this use case (deleting a private repo without deleting forks) but did not bother implementing a proper choice for the repo delete flow?

hoherd · on Feb 1, 2023

This seems really bizarre to me. They seem to want people to have the network of connected GH repositories, but this behavior promotes "forking" a project in a way that breaks that network, which is to `git clone` and then create a new repo from that clone.

To put it another way, if the user had "forked" the GH repo onto GitLab, there would be no data loss, but that behavior would promote using GH in a way that breaks the upstream/downstream relationship that you see on GH.

It's even worse that the deleted fork was private. What impact does GH expect deleting the hosted private repository has on folks who really want to keep a private copy of the repo, such as offline or on another git hosting site? I'm really struggling to see any real-world positive sides to this mechanism. Seems like an ineffectual legal or compliance CYA.

Marsymars · on Feb 1, 2023

> Companies of course have the right to manage access to their proprietary source code, for example by only giving access to corporate accounts under their control and reclaiming those accounts when an employee leaves.

This is how it should be done, but is too much overhead for many "IT as a cost centre" companies.

ssalka · on Feb 1, 2023

Also, it would ruin my GitHub contributions graph

account42 · on Feb 1, 2023

A simple script and a cron job will fix that problem.

masukomi · on Feb 2, 2023

no need for a cron job

custom fake git history with this: https://github.com/artiebits/fake-git-history

therealdrag0 · on Feb 1, 2023

It why would GitHub care to build functionality to behave how companies who aren’t paying them want it to behave?

madduci · on Feb 1, 2023

So the main solution would be not forking, but cloning and straight create a separate project? Will it work?

kevincox · on Feb 1, 2023

Yes, I believe that if you push a repo to GitHub from a local copy instead of using their web-based "fork" feature it will not deduplicate the repositories. IDK if this affects your ability to submit pull requests though.

snowwrestler · on Feb 1, 2023

Simply saving a copy of the fork locally would be sufficient to keep a copy of it.

comprev · on Feb 1, 2023

My client recently moved from GitLab (self hosted - many teams had their own isolated server) to GitHub.com and managing access for the thousands of developers has been a small headache. We were encouraged to use our personal GitHub accounts instead of making new ones.

They are promoting "internal open source", yet due to a wild variety of permissions, colleagues can't fork to their own space or push a branch for a PR. Chasing the repo owners or at least someone with authority to grant permission is rarely worth the hassle.

mike_d · on Feb 1, 2023

> My private forks are mine

Your employment agreement disagrees. Blame the confusion on the blurry line GitHub draws between forking work repos into personal accounts.

heleninboodler · on Feb 1, 2023

> Your employment agreement disagrees.

Er, kinda presumptuous of you (and GitHub), no? None of my private repos, forks of other people's private repos, or other people's forks of my private repos are in any way governed by an employment agreement, and if they were, there's no way for GH to know what that agreement says.

pxc · on Feb 1, 2023

Did you forget the part where this code is MIT-licensed? Yes, they don't own it, but the code is still 'theirs' to keep forever as they see fit.

shagie · on Feb 1, 2023

The original code may be licensed MIT. The MIT license allows for the project to be relicensed, closed source and it is also possible for a proprietary contributions that aren't MIT license to be added to it that are protected as any other closed source code. The MIT license is not "viral" and doesn't require that everything following from it is.

The person may be able to find the original code that was MIT licensed but that doesn't mean that the work done in house is also MIT licensed and that they have any right to it.

eyelidlessness · on Feb 1, 2023

IANAL but…

> The original code may be licensed MIT. The MIT license allows for the project to be relicensed, closed source

… this is less compelling to me than this:

> and it is also possible for a proprietary contributions that aren't MIT license to be added to it that are protected as any other closed source code. The MIT license is not "viral" and doesn't require that everything following from it is.

AFAIK, changing the licensing terms of an MIT project isn’t retroactive to prior licenses. A quick search seems to confirm that.

The possibility of more restrictive or revocable licensing of subcomponents is more compelling as a rebuttal to “mine” at a philosophical level, but it’s not compelling from the perspective of GitHub revoking access. They’re welcome to comply with relevant legal actions, but they’re not actually the police of your licensee status and don’t even attempt to be.

Ultimately it’s the person who maintains the private repo who is responsible for and to any license challenges. GitHub isn’t a party or privy to those agreements, and again doesn’t have any pretense of such except compelled by legal action. And I give them the benefit of the doubt that this isn’t their motive.

This behavior is part of their own permissions model, and their own model of the relationship between “forks” and “private”, as defined by their own use cases. It’s a surprising one, but it needn’t have anything at all to do with their view of any given user’s repo’s license compliance.

shagie · on Feb 1, 2023

The first part sets up the second part that unlike the GPL, the MIT license doesn't require that future contributions to the project be any particular license.

Presumably the OP can find the open source project MIT licensed without the company's contributions to it.

The only thing that the MIT license requires is:

> The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

It doesn't even require attribution in the compiled application or any other notices.

And so, it is possible (and I would dare say likely) that the contributions that the OP made while working on the repo at the company unless specific permission was given otherwise would be considered as work for hire or as part of the work product as condition for employment and completely owned by the company (and not MIT licensed).

If the OP thinks that they should have access to the code because of the MIT license, that is something to take up with a lawyer. My IANAL senses suggest that that would be rather fruitless.

I don't find GitHub's model particularly surprising but rather the most reasonable one that opens GitHub up to the least liability for accidental disclosures of content. Error on the side of least privilege and if there's something to work out beyond that, that's something for the contributors to work out themselves - GitHub isn't the arbitrator for that.

eyelidlessness · on Feb 1, 2023

> And so, it is possible (and I would dare say likely) that the contributions that the OP made while working on the repo at the company unless specific permission was given otherwise would be considered as work for hire or as part of the work product as condition for employment and completely owned by the company (and not MIT licensed).

This is a radical interpretation of the text if I’ve ever seen one. To the extent any of their contributions were merged upstream, they’re inherently MIT licensed by virtue of being in the same codebase which offers that license. To the extent they have unmerged changes, they may well be works for hire but it isn’t GitHub’s role to decide that between a second and third party.

Again nor do they want to. GitHub is extremely hands off about forks and the licensing implications thereof.

This isn’t a GH posture towards licensing disputes, it’s their posture towards their own authorization model. And that’s fine, but we shouldn’t conflate the two when they’re quite distinct.

shagie · on Feb 1, 2023

Who's upstream?

Open source project and publicly accessible upstream? Yep. They're likely MIT licensed in accordance with the CLA.

Internal company copy of an MIT project? Unlikely unless legal says they are.

If the organization that I worked for made an internal copy of Influxdb ( https://github.com/influxdata/influxdb ), my contributions to the private and internally hosted copy are not inherently MIT licensed too and furthermore don't need to be redistributed.

If those changes were submitted upstream to the influxdata/influxdb repo, and I signed the CLA with them - then yes they would be.

What's more, if I left the organization, then I don't necessarily have a right to the contributions that I made to the private and internally hosted copy's codebase.

eyelidlessness · on Feb 1, 2023

It doesn’t sound like we have any substantive disagreement other than potentially what GitHub’s role in it is or should be.

pxc · on Feb 1, 2023

OP wrote

> That was an MIT-licensed open source project I worked on years ago.

which to me implies that OP also received the code under the MIT license and not some other license.

shagie · on Feb 1, 2023

However, the work done while at the employer may not have been done under the MIT license unless permission to license it and distribute it under the MIT license was given by legal.

chrisfosterelli · on Jan 31, 2023

Yeah I understand how this is unfortunate but for 99% of cases this is what you want to happen when you remove someone's access to a private codebase.

It seems like this is an unusual use case that OP had an MIT-licensed open source project whose source only exists in a private repo.

cma · on Jan 31, 2023

Is this scenario possible?

- you fork a public repo

- it's visibility is changed to private but you have access through the org

- they delete it

- you lose your fork, which was only ever of the public repo

kitkat_new · on Jan 31, 2023

no, in this case it works as expected;

"GitHub will detach public forks of the public repository and put them into a new network. Public forks are not made private."

https://docs.github.com/en/repositories/creating-and-managin...

But better be safe than sorry...

nly · on Jan 31, 2023

What's the point? You can always clone locally and push to a new repo without forking on GitHub. Repos will be unlinked and history will be intact.

chrisfosterelli · on Jan 31, 2023

It's for the same reason that you don't leave ex-employees with access to other intellectual property. Just because they could make their own copies during their tenure doesn't mean you shouldn't try to reduce your exposure.

voytec · on Feb 1, 2023

OP states that it was "an MIT-licensed open source project". It appears that Microsoft has lack of licensing understanding not only when it comes to copilot.

If they are punishing users just under an assumption that user is doing what they are doing (using someone's codebase with non-permissive licence, possibly unlawfully), it's just silly.

b1nj0y · on Feb 2, 2023

agree

irjustin · on Feb 1, 2023

It really sounds like the forking and access should be "license aware" which sounds like an absolute nightmare to manage but that would help in this specific scenario?

I can see it simply creating more worms than helpful.

funcDropShadow · on Feb 1, 2023

It really sounds like users of Github shouldn't trust Github to keep their forked repositories for them.

jiggawatts · on Jan 31, 2023

Microsoft does this a lot, where they assume that rules that apply to their own organisations apply to all organisations in precisely the same way.

If a Microsoft IC leaves, they should lose access to all Git repos, including forks.

If Joe Random open-source contributor is removed from an open source repo's access list, their fork shouldn't be wiped.

But Microsoft has One Rule To Rule Them All, so they won't make exceptions for unimportant people like their customers.

I see this a lot. A good example is Azure Active Directory, which is basically "Microsoft 365 Authentication" that they rebranded and sold to developers for their own use, i.e.: Azure AD Enterprise Apps, App Registrations, and B2C.

There are many aspects of the AAD design that make zero sense until you pause for a second and realise that it is not designed for you. It's designed for Microsoft 365!

For example, auditing. My customers are typically government agencies or banks, and they have strict auditing requirements, especially related to data access. All user authentication MUST be logged, including client IP address, and everything else. Most access is by their own staff, or by other orgs that have signed various contracts or agreements, so there is no expectation of privacy.

This is basically impossible with many configurations of AAD. It just refuses to collect meaningful audit logs. Why? Because GDPR applies to Microsoft 365 and they don't care about the data hosted on services such as SharePoint Online. That's not Microsoft's data, that's their customers' data, so its up to the customers to enable logging "on their end", in their individual AAD tenants.

There is no way to centrally collect logs as a service provider using AAD in a multi-tenant scenario.

When I asked Microsoft about this, they waffled on about GDPR and privacy regulations -- which apply to them, but not us.

Another example is Microsoft Teams, which hides the name of the organisation people are coming from. In large multi-org meetings this is infuriating, because you have no idea where anyone is from. Microsoft does this because they use outsourcers like MindTree for support, and they don't want their customers to see this in Teams meetings for Azure support tickets. No-one is allowed to see where people are from so that Microsoft can bullshit their customers.

donmcronald · on Feb 1, 2023

> There are many aspects of the AAD design that make zero sense until you pause for a second and realise that it is not designed for you. It's designed for Microsoft 365!

Business Basic accounts being limited to 7 days of login logs is a huge middle finger to the entire small business sector. Of course they think everyone should just buy Enterprise subscriptions. It's nothing more than a corporate version of "don't be poor".

jefftk · on Feb 1, 2023

Segmenting an enterprise version of a product is generally about finding features that are disproportionately valuable to enterprise (centralized control, policy enforcement, auditing, etc) separating them into a different offering. This lets you charge less to small businesses without having your small business product cannibalize your enterprise business.

This seems basically fine to me? If there are a lot of small businesses who are unsatisfied with Business Basic and can't afford Enterprise then there's an opening for a competitor.

hakfoo · on Feb 1, 2023

The particular segmentation is a questionable choice.

Small enterprises are likely to have small IT/Security staff, and the most likely, therefore, to not notice something awry for a few days, at which point, vital log info has already rolled off the 7-day window.

donmcronald · on Feb 1, 2023

This is exactly the issue. No one monitors the logs and, by the time they figure out something is wrong, there isn't enough info available to properly assess the scope of the damage.

Another problem is the Business Basic product is too complex for what small businesses need (reliable email) and buying something even more complex to get a couple of extra features like proper logging is counterproductive.

As is, if a small business ends up with a compromised admin account I don't think it's unreasonable to consider migrating them to a different service. It's nearly impossible to guarantee a bad actor hasn't hidden a back door somewhere in all that complexity if your only tools for assessment are the ones offered in the Business Basic subscriptions.

jefftk · on Feb 1, 2023

Maybe the small businesses you've encountered are different from the ones I have? My expectation is that most have no security staff and wouldn't use this feature even if it had indefinite retention.

djur · on Feb 1, 2023

This behavior was not introduced to GitHub by Microsoft.

charcircuit · on Jan 31, 2023

Github knows the license of projects. It knows if you forked an open source project.

jzb · on Jan 31, 2023

It knows the license applied to the repository. For a private repo it may not be "released" under that license but planned for release.

If AcmeCorp is planning to release - but hasn't - a project under MIT or whatever, they may have the license declared in the repo but that's not a guarantee it's ever going to be released.

If it's a private repo, and your access has been under your status as an employee, then I don't know that counts as distributed to you under that license. If AcmeCorp later decides to change licenses or not release the software as open source, then it makes sense for GitHub not to let someone continue access.

There are a LOT of holes in the system, but I'm not sure GitHub is in the wrong for deleting access to a private repo if you lose access to an organization or whatever.

charcircuit · on Feb 1, 2023

Software can be open source and not released. Employees can legally release it themselves since it is open source.

funcDropShadow · on Feb 1, 2023

No, the software becomes "Open Source" or "Free Software" the moment someone licenses it to somebody else under such a license. Simply copying a file named LICENSE into some private directory has no legal relevance. As an employee, you usually don't get a license to the work artifacts you are working on.

jzb · on Feb 3, 2023

Really no. The employee doesn’t have that right, it’s not theirs to release unless the employer gives permission.

gregwebs · on Jan 31, 2023

This unfortunately makes sense because it is a private repo. Even if a repo is labeled as being MIT and has an MIT license in it, it still may contain other code of a different license.

Github could do better by warning the repo owner when they delete a private repo. Github could ask the repo owner if they want to convert it to public first (a "set it free" option) or otherwise give the option to avoid deleting the forks of others.

pavon · on Jan 31, 2023

Yes, and under most FLOSS licenses (including BSD/MIT/LGPL/GPL/AGPL), companies are perfectly free to maintain modified versions of the software internally with no obligation to publish the source externally, until/unless they distribute the modified software externally (or allow external use the software remotely in the case of AGPL). All of the modifications are copyright the company and it is their choice whether to release them publicly. Employees having access to the modifications doesn't mean that the modifications have been licensed to them under the original license, and thus doesn't give them permission to distribute the modified software to others under that license.

Essentially what I just said is the same as what you did. The private modifications cannot be assumed to be under the same license as the original software. Gitlab has no way of knowing all these details, and have promised to keep private repos private, so their current policy is the correct one.

Mystery-Machine · on Jan 31, 2023

GitHub is not supposed to make such decisions for the user here. It is user's responsibility to make sure they delete their private forks if they shouldn't have access to the repo/fork anymore.

What's next? Should we all install spyware on our computers and let GitHub automatically delete local copies of forks as well?

GitHub and the company/person, who deleted the original private repo, should inform the owner of the fork that the main repo was deleted. If need be, company/person can request fork owner to delete their private fork and local clone as well.

pavon · on Jan 31, 2023

I think this incident reinforces that private repos on github.com are a weird hybrid of the public github and on-premise github which creates various practical problems and misunderstandings when those two security models collide.

First off is the fact that forking a repo is often a necessary step in contributing to project if you don't have push permission, so these forks will be created during the normal development processes, not necessarily because the employee was intentionally trying to save off their own copy. So it is perfectly normal for the employer to consider those forks to be something it should own and manage, just like it would on an on-premise installation.

On the otherhand, github still encourages people to use a single account for both personal use and work[1]. Naturally the employees reasonably consider all the forks that are in their personal account to be something that they should own and manage. So you end up with situtations like this.

The lesson - mixing work and personal accounts/computers/devices is a horrible idea regardless of what Github says. Employers shouldn't allow it, and employees should avoid it even if allowed. Then both will have a clear idea of who owns and controls what.

[1] https://docs.github.com/en/get-started/learning-about-github...

geraldwhen · on Feb 1, 2023

Enterprise Managed Users solves the issue.

jyrkesh · on Jan 31, 2023

MIT licensed code doesn't mandate distribution. Companies and organizations are perfectly within their rights to own a private fork of an MIT-licensed codebase in perpetuity.

With that in mind, if you fork an organizationally-managed repository, there's a good chance the owner doesn't want you to continue to have access to that codebase if you're no longer a part of the organization. And the local copy? Well there's a good chance you were only allowed to clone the repo on an IT-managed device with specific 2FA policies and some kind of agent/config to prevent/reduce data exfiltration from that device.

Is it a perfect system? Hell no. Data leaks, that's part of life. And I'm with you that it certainly could be more user-configurable.

But it's also extremely well-documented behavior[1], and seems like a key design choice that GitHub made a long time ago to protect the owners of private repos. Ultimately, if you don't care about who has access to your code, you signal that by making the repo public. Or by telling your private collaborators to make sure they hold on to a local copy.

[1]: https://docs.github.com/en/pull-requests/collaborating-with-...

bunbun69 · on Feb 1, 2023

Seems like you don't like GitHub. Have you considered not using it?

kitkat_new · on Jan 31, 2023

"This unfortunately makes sense because it is a private repo."

I disagree. I expect a (i.e. my) fork to be independent of the original repository, no matter if it is private or not.

It's enough if a fork of a private repository is private then too.

bcrosby95 · on Jan 31, 2023

This is why I don't use github's fork feature. There's more than just this restriction they impose upon you.

Instead I prefer to use a "git" fork. I just clone it and upload it to my own repo. Assuming the license permits of course.

dreamcompiler · on Jan 31, 2023

This is the right answer: Break the fork link. I sometimes do this to make a private "fork" of a public repo so that I can add my own notes about how to use it, remind myself what happened when I tried it, add a config script for my own peculiar setup, etc.

It's unfortunate because not having a "real" fork makes it harder to send pull requests and track the upstream. But it's sometimes necessary to get around stupid github policies.

ntrz · on Jan 31, 2023

Don't you need to have an "GitHub-approved" fork (i.e. use the GitHub fork button) if you want to create pull requests on the upstream project in GitHub? Or is there a way to do that from the kind of repo you're describing?

edaemon · on Jan 31, 2023

If that came up you could create a GitHub-native fork and add that as a remote.

naikrovek · on Jan 31, 2023

Don't use the "Fork" button in the GitHub UI, then. It is intended for collaboration and establishes and maintains the parent-child relationship of "your fork" and if the parent repo is deleted, so are all forks. If the parent repo is private and goes public, so do all forks. If the parent repo is public and switches to private, so do all forks. This behavior is laid out in docs.github.com and is not secret.

This has been the case on github.com for over a decade, and I am slightly shocked that people don't know this. I guess the root of that is that I am surprised that this has not bitten more people than it has.

People assuming things are a certain way and never checking to verify that are by far the greatest source of "I shot myself in the foot" statements that will ever be known.

kitkat_new · on Jan 31, 2023

"If the parent repo is private and goes public, so do all forks."

nope, that's not true:

"GitHub will detach private forks and turn them into a standalone private repository. For more information, see "What happens to forks when a repository is deleted or changes visibility?""

" If the parent repo is public and switches to private, so do all forks."

This isn't true either:

"GitHub will detach public forks of the public repository and put them into a new network. Public forks are not made private."

In these cases, exactly what I would have expected happens.

naikrovek · on Feb 1, 2023

yep I was wrong about that point. generally my point still stands; if you want total control over your repo, don't use the GitHub "Fork" button to create your repo for that code.

I linked directly to the documentation about "Fork" in another comment.

mattigames · on Jan 31, 2023

That would make sense if they didn't use the word "fork" for it, that words has a specific meaning when talking about repositiories[0] and it doesn't include a automatically propagating deletions or settings of the original repo, it doesn't actually include ANY automatic propagation, therefore GitHub should use a different word for this kind of fork, something like "Crate child fork" or "Linked fork" or maybe a new word altogether

[0] http://www.freekb.net/Article?id=1263

djur · on Feb 1, 2023

GitHub coined this particular use of "fork", and it's always been about having an automatically managed relationship between the original repository and the new one. A copy without that automated connection is a clone.

MereInterest · on Feb 1, 2023

No, it definitely predates GitHub. And git itself, by at least a full decade.

https://en.wikipedia.org/wiki/Fork_(software_development)#Et...

djur · on Feb 1, 2023

GitHub specifically invented the idea of "forking" as a social action on a forge site that allows you to create your own associated copy of a repository. This is related to but different from the broader meaning of "fork". "Fork" doesn't mean anything at the git level.

MereInterest · on Feb 1, 2023

I'm not sure exactly the distinction that you're trying to make. I see GitHub's use of "fork" as a specific application of the broader meaning of "fork", not an invention of a new and distinct concept. Just as putting "wheels" onto a steam engine can produce a new type of vehicle but doesn't change the concept of "wheels", GitHub's use of "fork" doesn't fundamentally change the broader concept of "fork".

mattigames · on Feb 1, 2023

If any changes done to the parent repository propagate automatically to "forked" repositories without the explicit consent of the _owner_ of the fork then it does change the broader concept of fork, and to follow your analogy it would be like calling a caterpillar track a wheel.

If this is acceptable because the original version it's a private repository that is unrelated, what we are discussing is the meaning of the word itself.

MereInterest · on Feb 2, 2023

I think we are in agreement. Because access to the "forked" repository was removed without the consent of the owner of the fork, it is inaccurate for GitHub to describe it as a "fork". For clarity, I would also describe the "owner" of the fork as the person who created the fork.

alerighi · on Feb 1, 2023

I see the fork feature useful since for example if you fork a project that is no longer maintained user can search in the forks and find that you are now maintaining it. I've found myself doing it a lot of times.

Regarding forking a private repository with a public repository, it's a corner case for sure. In my opinion it's best to forbid forks of private repositories at all, and forbid to make a repository that has forks private, than to create problems like the one of the user in this topic.

csteinbe · on Jan 31, 2023

This is documented at: https://docs.github.com/en/pull-requests/collaborating-with-...

darekkay · on Jan 31, 2023

That's why I usually don't use the official "fork" feature, but clone and push the repository manually instead. I would like to keep the fork network connection on Github, but I don't want to see my fork deleted because of an error, malice or simply lack of knowledge.

rezonant · on Jan 31, 2023

It will only be deleted if the repo you fork from is a private repository. The documentation [1] covers the other scenarios, in all of which you keep your copy of the code (including when the public repository is made private later).

[1] https://docs.github.com/en/pull-requests/collaborating-with-...

andyjohnson0 · on Jan 31, 2023

> It will only be deleted if the repo you fork from is a private repository

This makes sense. Thank you for clarifying that important detail. It seems to be missing from the parts of the discussion I've read here.

TheRealPomax · on Jan 31, 2023

No, it doesn't. It only makes sense until you stop and go "Wait, no, hold on a minute. Why would they delete the fork instead of simply severing the fork relation in their fork relations table?".

Ferret7446 · on Feb 2, 2023

Consider:

You have access to many private company files. After you leave the company, the company is obligated to send you copies of all of the files because you may have linked to them. After all, you could have made personal copies of all of the files, so you should still retain access through links.

darekkay · on Jan 31, 2023

TIL, thanks! I probably confused this with the GitHub takedowns, when forks are removed as well (as it happened to the youtube-dl repo). I could imagine my manual clone not withstanding such takedown either, though.

rezonant · on Jan 31, 2023

Yeah for cases like that, keep a local copy, thankfully many people did

remram · on Jan 31, 2023

What if the original is made private and then deleted? Does your fork remain?

vlz · on Feb 1, 2023

Yes, the docs linked above say:

"If a public repository is made private and then deleted, its public forks will continue to exist in a separate network."

https://docs.github.com/en/pull-requests/collaborating-with-...

account42 · on Feb 1, 2023

IIRC GitHub will also delete the entire fork network for DMCA request even if your fork is not mentioned explicitly.

devmunchies · on Jan 31, 2023

Same. Another reason is I don't like how Github inserts "Forked from ..." in the project name. If your "fork" becomes extremely divergent after a couple years (maybe you had a different vision for the project), you are still stuck with the "Fork from..." sub-header, which basically tells users that they should look at the original. I'm otherwise fine putting attribution in a README.

djur · on Feb 1, 2023

Right, the entire point of GitHub forks is to make it easier to upstream your local changes to the original repository. If you have no interest in doing that you shouldn't use a fork.

account42 · on Feb 1, 2023

Which is pretty much the opposite of what fork used to mean before GitHub...

_wf2l · on Feb 1, 2023

you know you can fork dead and abandoned projects right?

djur · on Feb 1, 2023

Sure, you can. It might not be a good idea in the long term.

fragmede · on Jan 31, 2023

“But the plans were on display…”

“On display? I eventually had to go down to the cellar to find them.”

“That’s the display department.”

“With a flashlight.”

“Ah, well, the lights had probably gone.”

“So had the stairs.”

“But look, you found the notice, didn’t you?”

“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”

― Douglas Adams, The Hitchhiker's Guide to the Galaxy

agolio · on Jan 31, 2023

To clarify the appropriateness of this analogy:

This is unexpected behaviour from Github here which may (and has, by the anecdote of OP) cause permanent data loss. Documentation is not good enough, as users should not have been expected to have read the entire documentation.

chrisfosterelli · on Jan 31, 2023

I guess this is a question of who should have been given further information. For example, whoever at the organization deleted the repo would have been given a very clear warning screen including the number of forks that would be deleted by their action prior to them doing it.

On that note, an organization admin can _directly_ delete your private fork without even deleting the source repository if they want. GitHub's permission model is fairly direct that private forks you make through your membership in an organization are more the organization's property than the forker's.

capableweb · on Jan 31, 2023

> I guess this is a question of who should have been given further information. For example, whoever at the organization deleted the repo would have been given a very clear warning screen including the number of forks that would be deleted by their action prior to them doing it.

This is exactly how it works today already. If I try to delete a private repository people have forked, I see the following:

> We will also delete all 4 forks since this is a private repository.

Clicking on the delete button, again:

> Unexpected bad things will happen if you don’t read this!

> This will also delete all 4 forks since this is a private repository.

> [type name of repository]

chrisfosterelli · on Jan 31, 2023

Yes, that's what I said. It sounds like the parent poster is suggesting that the fork account should instead have that notice.

donmcronald · on Feb 1, 2023

IMO it would be useful if non-obvious behavior like that were warned about when you fork the repo. I know I'd get burned by that. I keep a local mirror of everything though.

chrisfosterelli · on Feb 1, 2023

That's a fair point.

alerighi · on Feb 1, 2023

In my opinion in this case the fork shouldn't be allowed to be created at all. If this is the final effect it's better to inform the user that "no, we don't let you fork the repo". So he could have done it the normal way, clone the repo and push it with another remote, that would have not had this issue.

To this day I thought that the "fork" concept was only a relationship at the level of UI, but as I see it has a logic in it, that is the fork depends on the original repository even for permissions, and that to me is surprising!

chrisfosterelli · on Feb 1, 2023

I think you're misunderstanding the way that a lot of orgs use forks. Many orgs will have the team fork the repo under their own account so they get their own working space, and then they make PRs from their forks back to the origin. Before branch protections it was also the best way to manage write permission. This is a really common pattern and not allowing it would break how a lot of people use github.

If the org doesn't work this way, it can disable forking so that it's not allowed at all on the repo (or org-wide), like you said.

Barrin92 · on Jan 31, 2023

I don't think it's an unexpected at all given that the original repo was private, not just the fork. Secondly a GitHub "user" isn't really a user in the consumer sense. They're a developer, as a developer/professional you can be expected to consult the documentation of a tool you use so you understand default behavior.

themitigating · on Jan 31, 2023

Is it unexpected though? The repo was forked from an org by the person who was a member of that org.

I know this isn't common but I actually use a unique user for my company "myname-company"

sangnoir · on Jan 31, 2023

If you have a Raspberry Pi wasting away in a drawer[0], I strongly recommend installing Gitea or Forgejo and mirroring all the repos you like (i.e. the ones you contribute(d) to and/or starred, not just on Github too!). You set it up once and it will sync in upstream changes as often as you like (default is daily)

0. Or a homelab, or a cheap 256MB VM, or a NAS that can run docker containers, or an old Chromebook: anything that can execute a Go binary, x64, arm6 or even mips

remram · on Feb 1, 2023

I wonder if there is an efficient way to do direct incremental git-to-S3 backups, or if you have to do this, run a Git mirror and do regular filesystem-level backups of it.

yencabulator · on Feb 2, 2023

https://git-scm.com/docs/git-bundle creates a single file that contains everything in A not in B, so e.g. the delta between the state of the repository yesterday and the state today (for all the refs in it). You just need to produce the "rev list" to save.

anshumankmr · on Feb 1, 2023

a cron job ought to do this. Try something like Cloud Scheduler that can automate this for you.( I am not sure what the equivalent is in AWS)

remram · on Feb 1, 2023

My question is not how to run it, but what to run. If your scheduled task does a full clone every time to upload as ZIP to S3, it is massively inefficient. Even if you use something like Restic, because the Git pack file will have nothing in common with the previous one.

saltcured · on Feb 1, 2023

Clone each repo locally. Periodically do "git fetch -p" for each repo to update the local copy of upstream content. Run some periodic task like restic or rclone (depending on whether you want point in time snapshots or just a mirror of latest state) to mirror these local repos into your S3 bucket.

The local clones should evolve incrementally due to "git fetch", and then the restic or rclone task should figure out how to make incremental updates to the S3 content.

remram · on Feb 2, 2023

But that's... exactly what I described and asked how to avoid...

saltcured · on Feb 2, 2023

I had trouble parsing your earliest comment, so I only tried to address the incremental backup concern. I may not have understood the conversation, but it seemed like you claimed that a filesystem level backup of a clone was not going to produce incremental backup IO in practice.

A periodic fetch into a persistent cloned repo will be incremental unless the upstream is doing something crazy with frequent branch deletions and repacks. In practice, most upstream repos I encounter behave relatively monotonically. They accumulate new commits and branch/tag heads but do not often create garbage or need repacking.

A periodic backup of the cloned repo will also be incremental if using an appropriate tool like restic or rclone-copy. Also, since the clone only changes during the fetch, you can serialize these in one periodic job and be confident that you are making a consistent snapshot of the repo.

The advantage of this approach is its simplicity. It is easy to reason about and easy to work with the backups to restore a repo without having to learn about other tools. It's the kind of thing I could feel comfortable setting up and running for years on end with little supervision.

A more sophisticated approach that integrates with git hooks, e.g. to do event-driven rather than periodic backup, is plausible but I think could quickly get in the way of itself. And if working with a hosted upstream, you would need to integrate with their proprietary hooks, e.g. GitHub actions, and deal with other restrictions of the hosting environment. Such a solution likely brings new failure modes and may not be a worthwhile tradeoff...

remram · on Feb 3, 2023

Again, this requires you to have a persistent clone on a filesystem. I specifically wonder if we can do (and I quote) "direct incremental git-to-S3 backups", and you keep replying "it's easy, do it indirectly with a persistent cloned repo".

I don't understand where you are stuck, tbh.

yencabulator has provided a good tip I think, as you could store the previous set of refs and use that to build an incremental Git bundle (one with only the objects that were not in the previous bundle). I don't know if you can do that with the existing Git client though.

kitkat_new · on Jan 31, 2023

I'll look for a tool that mirrors all my repositories (i.e. only forks). If I can't find one, I'll write it myself.

drudoo · on Feb 1, 2023

This is build in to Gitea. I mirror multiple github repos to my private gitea instance.

pabs3 · on Feb 1, 2023

https://github-backup.branchable.com/

jmondi · on Feb 1, 2023

I wrote a little tool to mirror my repositories to my gitea instance. It has been months to potentially a year or two since I ran this, but it does what you’re asking. You can mirror repositories of users and repos users have starred. It definitely needs some love.

https://github.com/jasonraimondi/deno-mirror-to-gitea

mckn1ght · on Feb 2, 2023

I wrote a similar one I called Forgery that’s in a repo of the same name in my github account (which can be found in my user profile… just created this account and didn’t want to post a link in my first post, thought it might get automodded).

Very similar to yours, but also does forks, which I’m not sure yours does by a quick glance at your readme. Although, mine doesn’t automatically mirror from another forge, just clones everything locally. I’ll have to add a TODO to add mirroring.

Please have a look at how I handle them and consider adding a link to my project in your “similar tools” section, and I’ll do the same for you!

jmondi · on Feb 3, 2023

If I remember correctly, I believe that mine will do forks also, they are just included in in your repositories.

I just looked over your project, and it seems pretty cool. A little bit different than mine, since mine is specifically github to a gitea mirror repository. I don't mind adding a link to your project in mine.

Thanks for sharing!

rezonant · on Jan 31, 2023

I have to assume that the original repository was private when you forked it. If it was public, and then made private, then this should not happen.

If the original code was "open source", then why exactly was it in a private repository? Putting "MIT licensed open source" into a private repository is not publishing that source code for the world to use.

It sounds like nothing weird happened here other than this company thinking a private repository was "publishing it as open source".

alerighi · on Feb 1, 2023

Open source doesn't necessary mean that the code has to be share publicly on the internet, let alone on a GitHub repo! This is a common misconception. You can of course also decide to sell an open source software, that is have other pay you to obtain the software source code with an open source license (of course this person then can share the code, or even sell it to others legally, so it's not commonly done).

I can put a MIT or even a GPL license in a private repository that I have at my company. The meaning is that I don't release the source code, tough if one of my employees wants to take it and use it he can, and he can also decide to share it with other people, or put it on a public repo.

Why I don't want to put the repo public? Maybe I'm just lazy, I don't see too much value in the code, I don't want to write documentation, tests, whatever, I don't consider it of enough value, whatever, still I don't have problems with people that have access to the code that they use it, and share it if they want.

rezonant · on Feb 1, 2023

I have no misconception of what open source is, if you want to put your foot near this foot gun intentionally by paying GitHub to host code you've open sourced (while telling no one outside your org that you've open sourced it) that's fine.

A common misconception is that publishing a public repo comes with obligations to add tests, documentation and whatever.

baobabKoodaa · on Jan 31, 2023

> If the original code was "open source", then why exactly was it in a private repository?

It's been too many years to remember the exact reasons, but this was not the only repo in the project. This was the "working directory" that had all kinds of random stuff that a data science project might accrue over time. Later in the project we published 3 repos which were more "cleaned up" to be potentially useful to outsiders (I use scare quotes around "cleaned up" because the codebases are still a mess, sorry).

Anyway, 3 of the 4 repos appear to still be public:

https://github.com/futurice/health-visualizations https://github.com/futurice/health-visualizations-front https://github.com/futurice/laaketutka-prereqs

chrisfosterelli · on Jan 31, 2023

It would have had to be privately forked. If you change your repository from public to private, GitHub detaches the forks and leaves them public, so a repo owner can't just delete someone's public fork.

zxcvbn4038 · on Jan 31, 2023

I have a number of git repos that the original developers deleted - because I sync’d them to a usb stick with gitea. I think that is how you have to do it - never entrust a service, especially a free one, with your only copy of anything you value.

If the YouTube algorithm nukes your account and all your videos, you should be ready to upload them to a new account. Same with anything else digital.

My current is standard is one copy in AWS S3 which is super reliable but too pricy for daily use, and one copy in Cloudflare R2 or Backblaze B2 which might or might not be reliable (time will tell) but is hella cheap for daily use.

capableweb · on Jan 31, 2023

> because I sync’d them to a usb stick with gitea

Just a tip: no need to use gitea if you want to replicate a git repository to somewhere else on disk/other disk.

Just do something like this:

    mkdir /media/run/usb-drive/my-backup-repo
    (cd /media/run/usb-drive/my-backup-repo && git init)
    git remote add backup /media/run/usb-drive/my-backup-repo
    git push backup master

And now you have a new repository at /media/run/usb-drive/my-backup-repo with a master branch :) It's just a normal git repository, that you also can push to over just the filesystem

josephg · on Jan 31, 2023

Even better with

    cd /media/run/usb-drive/my-backup-repo && git init --bare

Bare repositories don't have a working directory. You can still git clone / git pull from them to get the contents. You can also git push to them without clobbering any "local changes" (there aren't any).

More detail here:

https://www.atlassian.com/git/tutorials/setting-up-a-reposit...

capableweb · on Jan 31, 2023

Yeah, better in terms of saving space, but I think it confuses some people, hence I didn't use it in my above example. Previous time I recommended a co-worker to use the `push to a directory` way of copying a git repository, I made them create a bare repository, and they ended up going into the directory to verify it worked and not seeing what they expected. Cue me having to explain the difference between a normal repository and a bare one. It also confused them into thinking that a bare repository isn't just another git repository but a "special" one you can sync to, while the normal one you couldn't.

So in the end, simple is simple :) Unless you're creating remote repositories at scale, you probably won't notice a difference in storage usage.

josephg · on Jan 31, 2023

I hear all that, but --bare is necessary in this case because git (by default) won't let you push to a non-bare filesystem branch:

    ~/temp/a:master  $ git push backup
    Enumerating objects: 3, done.
    Counting objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 212 bytes | 212.00 KiB/s, done.
    Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
    remote: error: refusing to update checked out branch: refs/heads/master
    remote: error: By default, updating the current branch in a non-bare repository
    remote: is denied, because it will make the index and work tree inconsistent
    remote: with what you pushed, and will require 'git reset --hard' to match
    remote: the work tree to HEAD.
    ...
    To ../b
    ! [remote rejected] master -> master (branch is currently checked out)
    error: failed to push some refs to '../b'

westurner · on Jan 31, 2023

  git clone --mirror
  git clone --bare
  git push --mirror
  git push --all

"Is `git push --mirror` sufficient for backing up my repository?" https://stackoverflow.com/questions/3333102/is-git-push-mirr... :

> So it's usually best to use --mirror for one time copies, and just use normal push (maybe with --all) for normal uses.

git push: https://git-scm.com/docs/git-push

git clone: https://git-scm.com/docs/git-clone

dabber · on Jan 31, 2023

You can even take it a step further and have a push to origin update your backup as well.

https://stackoverflow.com/a/14290145

capableweb · on Jan 31, 2023

> In recent versions of Git you can add multiple pushurls for a given remote

Woah, that's really cool, didn't know about that. This is really useful! Thanks for sharing that.

sangnoir · on Jan 31, 2023

Gitea has a cron task that pulls-in changes on an ongoing basis.

If a snapshot suffices, a once off "git push" or "git clone" works (but that's not too far off from downloading a tar ball, is it?). If you want to have a up-to-date local copies of multiple repos, a SQLite-backed Gitea instance is the simplest solution.

An added bonus to using Gitea is flexibility in mirroring LFS objects, which can be sent to S3 or minio

capableweb · on Feb 1, 2023

> a SQLite-backed Gitea instance is the simplest solution

Agree to disagree :)

This seems like the simplest solution:

    git remote set-url --add --push origin [email protected]:my-user/my-repo.git
    git remote set-url --add --push origin /backups/repos/my-user/my-repo.git

Now when you push to origin, it pushes to your local backup as well, everything up to date, no external sosftware at all :)

Thanks dabber for sharing this trick (https://news.ycombinator.com/item?id=34603174)

sangnoir · on Feb 1, 2023

This does not include the cron jobs to pull in changes others make daily/weekly/every N hours. I mentioned that Gitea is superior under very specific conditions (where one wants to have latest version available locally).

spartanatreyu · on Jan 31, 2023

I did not realise you could set the remote's URL to a local filesystem path.

That opens some interesting possibilities.

Cheers

benatkin · on Jan 31, 2023

Have you tried running gitea? It's very light on resources, has good documentation, and also defualts to a main branch. It's also very easy to control where all the data is stored, and works well w/ sqlite.

tryauuum · on Jan 31, 2023

Is the "main" branch an advantage? I guess only if >50 percent of your repos use "main" branch

benatkin · on Jan 31, 2023

[flagged]

Alupis · on Jan 31, 2023

That is an awful lot of contortions you are doing here, to seemingly justify a word change that has had well-cemented meaning within the tech community since it's inception.

We all know why this change exists, and why some people will attempt to persuade others of it's superiority. It is, however, just silly virtue signaling, and it's exhausting to hear and read.

It would require some very irrational and underdeveloped reasoning to assert this word has anything to do with oppression in 2023. There is no negative connotation, except in those who wish to perpetuate some weird sensation of altruism... ie. no one is safer or feels better simply because you choose to call it "main" rather than "master".

sshine · on Feb 1, 2023

> no one is safer or feels better

Clearly some feel better.

I personally like the name “main” better.

But it is truly a pain in the neck that different pieces of software and even different distributions of the same software now disagree about the default.

I’ve got a handful of active projects that go together that differ on master/main because they were created by different softwares.

I’d prefer “hitler” if everyone could just agree to always pick that. GitHub are the big pushers of this culture change. If they succeed, I salute them.

Alupis · on Feb 1, 2023

There was zero disagreement about what to call the default branch in git prior to this linguistic crusade.

GitHub practically invented the exact situation you're experiencing now by changing to main for all new repos.

Git itself still defaults to master. Everything else is not the default and is the root cause for the uncertainty.

ceejayoz · on Feb 1, 2023

> It is, however, just silly virtue signaling, and it's exhausting to hear and read.

You should be aware that complaints about supposed virtue signaling are equally exhausting.

jrockway · on Feb 1, 2023

I'm willing to go out of my way to use a different word if it makes people feel better (the root of the master -> main transition). But this is the rare case where it benefits me by having to type two less characters whenever I refer to the branch! main is truly a win/win.

Alupis · on Feb 1, 2023

> I'm willing to go out of my way to use a different word if it makes people feel better

Which is just the thing, really. It makes no one feel better. It makes the privileged speaker feel better, with a false sense of virtue. It's a "look at how great I am" signal, nothing more.

No one is harmed or made to feel bad by using the word master. Sometimes the adults have to be present in the room, it seems.

Saving 2 characters is an equally silly excuse, but at least it has a realistic rationale. To that end, why stop at main - why not just 'm'? You can call it whatever you want in git.

benatkin · on Feb 1, 2023

It's because main is a goldilocks word for something like a default branch name. It isn't too long and isn't too short. It also isn't shorter in terms of syllables.

Good design :)

sshine · on Feb 1, 2023

> main is a goldilocks word

I totally agree.

> Good design

Having one default instead of two is better design.

endgame · on Feb 1, 2023

And it is _still_ causing completely avoidable pain, years after people started advocating for the change. A recent example I tripped over: https://github.com/brendanhay/amazonka/issues/815#issuecomme...

capableweb · on Jan 31, 2023

Yes, I've tried it (and actually run a personal instance myself), but I would never try to run an application meant as an webapp when I want to copy something from one filesystem to another, when git can do it already without any external programs.

Also, the `master` is just an example, it works for `main` as well, don't worry :) The created git repository on your usb-stick works like a regular git repository, you can use whatever branch names you want.

Btw, way to focus on the absolutely least interesting part of my comment, what I chose to name the branch...

simcop2387 · on Jan 31, 2023

The biggest reason to do this is thah it supports "mirror" repositories where it will keep your copy up to date, even using github keys to get at a private repo if you want.

benatkin · on Jan 31, 2023

Yes, and if you stop the process and start it again, gitea doesn't complain and picks up right up where it left off. Ditto if you lose internet connectivity. It's a well-designed piece of software. I considered using it as a BaaS and am actually thinking again of using it as one.

neodon · on Jan 31, 2023

You can change the default default branch on GitHub via <https://github.com/settings/repositories>.

I believe new GitHub accounts now have that set to main.

account42 · on Feb 1, 2023

The same works also over SSH using user@host:path for the remote - you don't need a deamon running on your server to push to it.

wjdp · on Jan 31, 2023

For this I have a NAS with a pretty basic script that runs nightly to clone any new repos I have and update those already backed up. They get organised into a directory structure mirroring that of Github: `./github.com/user/repo`

If of any use to anyone else: https://gist.github.com/wjdp/a20cb15f76b651124b3b27cde06d121...

no_butterscotch · on Jan 31, 2023

> If the YouTube algorithm nukes your account and all your videos, you should be ready to upload them to a new account. Same with anything else digital.

Do you know if this is a common occurrence?

Also, I'm only a YouTube viewer and am not familiar with all the creator tools, problems, communities, etc. But would a creator really re-upload all their back-catalog if deleted? Just to try to get back to views and things?

ChrisMarshallNY · on Jan 31, 2023

I remember someone posting an agonized screed, some time ago, about YT deleting their channel, and all the videos.

Apparently, they had not kept the source/rendered originals of the videos, so it actually clobbered their business.

I am a scarred, limping old coot, and have learned [the hard way] that backups are goooood.

tomcam · on Jan 31, 2023

> I am a scarred, limping old coot

That is a delightfully evocative phrase

tpxl · on Jan 31, 2023

A few weeks ago, youtube changed their swearword policy. A creator I follow basically had to delete half their channel or risk termination.

patmcc · on Jan 31, 2023

They're not banning channels based on swearwords (yet, anyway). They are demonetizing videos with swearing - in the first bit, if too much, maybe other rules, but nobody is getting banned from saying 'shit'.

Marsymars · on Feb 1, 2023

If you don't want monetization, is swearing in your videos to avoid your subscribers having to watch ads a viable strategy?

(To get around this: https://www.forbes.com/sites/johnkoetsier/2020/11/18/youtube...)

mgdlbp · on Feb 1, 2023

Huh, that probably explains... I was in a youtube rabbit hole right around then when some videos suddenly wouldn't load, turned out that I might've been the final viewer of the (small) channel that had had been banned at that moment. I was wondering what the chances were.

edit: Seems like it. The channel[1] name probably raised some new flag, and Google did its thing. Seems fair, it's not like a reasonable moderator would know of a concept of a second chance or anything.

[1] https://web.archive.org/web/20181123103308/https://www.youtu... https://web.archive.org/web/20220624154617/https://www.youtu...

// Ah, that channel was a pretty interesting part of the rabbithole of net culture-related parody too - rare to see collaboration like that

zxcvbn4038 · on Feb 1, 2023

Battlestar Galactica did it right - https://www.youtube.com/watch?v=rrYdQnz8vJg

MonkeyMalarky · on Jan 31, 2023

There were a whole bunch of artist and genre specific mixes I used to listen to on yt that are gone now. The uploaders accounts have all been nuked too. The sad thing is I can listen to it all on Spotify but it's not the same.. the creator's did not insignificant work to mix the songs together.

chipsa · on Jan 31, 2023

Maybe not on YouTube, but the gunTubers are having issues with YouTube changing their interpretation of the rules and instantly issuing 3 strikes against them for rule violations. And so, it'd be good to have a back catalog to upload to a different service to keep that older material available.

stuaxo · on Jan 31, 2023

Wonder if any data hoarders have made scripts to clone every site you have started on GitHub?

sidewndr46 · on Jan 31, 2023

I'm reasonably certain that if YouTube deletes your account uploading them to a new account is expressly forbidden.

j45 · on Jan 31, 2023

I’ve read having a secondary test account to post videos under to pass YouTube scans before posting to the real account helps minimize issues.

Gordonjcp · on Jan 31, 2023

How exactly can they stop you?

yunohn · on Jan 31, 2023

What do you mean? They have extensive content ID/checks.

Gordonjcp · on Jan 31, 2023

Post it somewhere else. Youtube is largely a waste of time, unless you're some right-wing idiot scamming money out of teenagers.

nobody9999 · on Jan 31, 2023

I completely understand your frustration. It's a dick move to delete your content without (at least) giving you a chance to archive that work.

At the same time this situation points up an important issue: if you don't own/control the infrastructure where your data lives, you don't own that data. Full stop.

If you host your data "in the cloud" (i.e., on someone else's servers) then you don't own that data, or at least not any copies stored there).

I'm not advocating for any specific action/solution in this comment (see my comment history for more about centralization vs. decentralization and "the cloud"), but the above is an important consideration, especially WRT long-term storage of your data.

kaushikc · on Jan 31, 2023

It's simple! Maintain the control and the capability to retrieve your important data at all times. The internet is a wild place and everything you don't save could potentially be gone forever.

nobody9999 · on Jan 31, 2023

>It's simple! Maintain the control and the capability to retrieve your important data at all times.

An excellent point, but I'd go further and say that one should maintain multiple copies of important data, with at least one of those on hardware/infrastructure you control and have physical access to.

ben0x539 · on Jan 31, 2023

> At the same time this situation points up an important issue: if you don't own/control the infrastructure where your data lives, you don't own that data. Full stop.

We've known about this for many years!! And yet we for many varied reasons choose to make use of services anyway!! That doesn't mean we shouldn't get to complain about those services and giving people shit for that is really weird!!

layer8 · on Feb 1, 2023

This is an endless rabbit hole though, unless and until, I guess, this gets regulated into a law that makes cloud service providers accountable for the data stored under the user accounts they provide, or something along those lines. Until then, you can (rightfully!) complain about one thing, but then the next feature of the next service you use may again have similar issues.

nobody9999 · on Feb 1, 2023

>We've known about this for many years!! And yet we for many varied reasons choose to make use of services anyway!! That doesn't mean we shouldn't get to complain about those services and giving people shit for that is really weird!!

I don't disagree at all.

I just find it a little surprising that on a site where "not your keys, not your coins" is accepted wisdom, that "not your storage, not your data" isn't as well accepted.

That said, I am biased and have an agenda:

1. The centralization of network resources is a recipe for disaster;

2. There are many factors which have pushed us toward more centralization, and most of those factors (asymmetric bandwidth on consumer internet links, abusive terms of service, e.g., port blocking/traffic throttling, crappy consumer networking gear, etc., etc., etc) rarely get addressed;

3. The issues in (2) create perverse incentives for commercial entities to further abuse their "customers" (for "free" services that should read "product");

4. Those perverse incentives have morphed outside of paid and "free" SaaS and subscription tech services, encouraging manufacturers of all manner of products (cars, appliances, computers, communication devices and a raft of other products to employ these abusive, rent-seeking tactics as well;

5. Resolving the issues detailed in (2) (as well as those not detailed) could enable both libre and commercial self-hosting products to become a viable, profitable industry, both for products and support services. Thus enabling us (broadly, humans who use the global internet) to actually own and control our data, PII and privacy;

6. Solutions are plentiful, but the perverse incentives cut across the entire OSI stack and beyond, making the reversal of such incentives complex and difficult, especially because the hoi polloi either don't know or have been convinced that they shouldn't care about ownership (of physical products like phones, cars and appliances) of their data and PII. I don't have a comprehensive set of solutions, but creating competition (municipal last-mile broadband, interoperability requirements, etc.) and providing consumers with the tools they need to decide for themselves (symmetric bandwidth on internet links, "dumb" internet pipes, non-abusive TOS, etc.) how they should host/manage/control their data and possessions will be important steps forward in reversing such incentives.

I rant about this every so often (this being my latest offering), and while it's not specific to Github or how their TOS treats various data storage offerings (repos), it's absolutely an example of how these perverse incentives harm and abuse consumers. In my view, that's wrong.

Edit: Clarified my prose.

hgsgm · on Feb 1, 2023

OP specifically asked GitHub to keep the code private!

It's silly to be upset about that, after refusing to make any backups or distribute any copies of this "publicly licensed" software.

Vecr · on Jan 31, 2023

I'm not sure why people are defending Github on this issue, what if the original repo was a template or something, and your thousands of lines of code is gone because the original template repo removed you as a contributor. If I copy something, I expect where I copied it from to have exactly zero bearing on what happens to my copy. If they have a problem they can serve legal documents, giving everyone time to figure something out with zero data loss.

theturtletalks · on Jan 31, 2023

Guess I’ll be cloning repos locally and then pushing instead of forking.

kevincox · on Feb 1, 2023

Apparently this removes you ability to open Pull Requests.

rezonant · on Feb 1, 2023

This only applies to private repositories. Do not put templates up as private repositories, and do not use forks when consuming templates. That is not what forks are for.

account42 · on Feb 1, 2023

It's irrelevant what they are for. Reality means that things like this get misused but deleting data without warning is still not ok.

andybak · on Jan 31, 2023

I agree completely.

But at the same time - who doesn't have local copies of anything they care about? What are they thinking!?

sshine · on Jan 31, 2023

You’d be surprised at how many use GirHub as their remote code backup platform. Having private file system backups is a question of culture, and a lot don’t have it.

hgsgm · on Feb 1, 2023

If no one has a copy of the software, does it really exist?

sbierwagen · on Jan 31, 2023

Is futurice/how-to-get-healthy a public repo? It's not visible on https://github.com/orgs/futurice/repositories

If you fork a private repo and they remove you from the collaborators list, it's reasonable that your fork would be removed.

baobabKoodaa · on Jan 31, 2023

Well, it's not reasonable to me, but we'll just have to respectfully disagree on this.

IncRnd · on Jan 31, 2023

It probably seems more reasonable to the people who own the repo.

baobabKoodaa · on Jan 31, 2023

I don't think the people who removed my access had any intention of deleting private repos. It's just somebody cleaning up ex-employee access from GitHub.

tedunangst · on Jan 31, 2023

As a non employee, how would I access this repo?

glerk · on Feb 1, 2023

The most “expected behavior” here would be to remove the fork relationship between the two repos and leave the copied repo as a plain private repo. I don’t think it is reasonable to just delete a private repo by default.

whstl · on Jan 31, 2023

I have an interesting but unrelated story about files going away in Github.

Remember when Windows Research source code that was used for teaching in university, and was leaked in the late-aughts, way before Microsoft purchased Github?

I had this code in one private repository called "ms" for more than a decade. It didn't have Git history or anything, it was just some random files, plus the leaked sources.

I totally forgot about it until a last year ago, when I checked, and the code is entirely gone.

I'm now more careful about what I put in private repos. In fact I don't have anything private there anymore...

Jorengarenar · on Feb 1, 2023

We ought to treat GitHub's "fork" more like "branch on my account".

If one wants to create an actual fork, then clone the repository locally, change the upstream and push to repo on your account.

josephcsible · on Jan 31, 2023

Never use the fork feature on private repos. Instead, clone the repo locally, create a fresh GitHub repo, and push your local clone manually to that. Doing so will protect you from this attack.

theossuary · on Jan 31, 2023

It'd be nice to be able to manually specify an upstream for a repo, that's the main benefit of forking in the UI.

hashtag-til · on Jan 31, 2023

Does that limit submitting PRs to the original repo somehow?

kevincox · on Feb 1, 2023

Apparently yes. You can't submit PRs to a repo outside of the "fork network". (IIUC these all share a single Git repo under the hood)

kitkat_new · on Feb 1, 2023

you need to make a fork for that, however you could add that one as a second upstream repo

IncRnd · on Jan 31, 2023

> Your private repository baobabKoodaa/laaketutka-scripts (forked from futurice/how-to-get-healthy) has been deleted because you are no longer a collaborator on futurice/how-to-get-healthy.

> and now it's gone... why?

Because it was a private not a public repo.

kitkat_new · on Jan 31, 2023

so what?

Private doesn't imply (common sense) that the original repository has power over any fork.

dboreham · on Jan 31, 2023

It does, otherwise github just wouldn't allow forking private repositories. If they did allow that, and retained no control over the forked copy, now you can ride a coach and horses through the access control to a private repo by simply forking it when you have access. My guess is that forking a private repository is a feature github intended to be used where employees or contractors of an enterprise want to fork their employer's repository as part of their development activities for that employer. Github sees those forks as transitively controlled under the organization's access policies.

Dylan16807 · on Jan 31, 2023

> If they did allow that, and retained no control over the forked copy, now you can ride a coach and horses through the access control to a private repo by simply forking it when you have access.

...which you can still trivially do if you use git to make the copy. And then your github repo will be immune to this kind of deletion.

So common sense says to me this should act similarly.

djur · on Feb 1, 2023

> ...which you can still trivially do if you use git to make the copy.

If you want to steal code from your former employer it's your business and your legal jeopardy. GitHub can't do anything about that. They can remove access to the copies they're storing for you, though.

GitHub has a weird model where they encourage using the same account for personal and professional work, which causes this kind of ambiguity. From their perspective, there isn't a real difference between forking a private repo and making a private copy of a shared Google doc in your work account.

Dylan16807 · on Feb 1, 2023

It's not about wanting to steal anything. It's that making a copy is trivial, so there's no point in worrying about how you can "you can ride a coach and horses through the access control".

Don't worry about the barn door when there is no side on the barn.