Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Which site is it?


My own shitty personal website that is so uninteresting that I do not even wish to disclose here. Hence my lack of understanding of the down-votes for me doing what works for my OWN shitty website, well, server.

In fact, I bet it would choke on a small amount of traffic from here considering it has a shitty vCPU with 512 MB RAM.


Personal sites are definitely interesting, way more interesting than most of the rest of the web.

I was thinking I would put your site into archive.org, using ArchiveBot, with reasonable crawl delay, so that it is preserved if your hardware dies. Ask on the ArchiveTeam IRC if you want that to happen.

https://chat.hackint.org/?join=%23archiveteam-bs


It is a public git repository for the most part, that is the essence of my website, not really much writings besides READMEs, comments in code and commits.


A public git repository is even more interesting, for both ArchiveTeam Codearchiver, and Software Heritage. The latter offers an interface for saving code automatically.

https://wiki.archiveteam.org/index.php/Codearchiver https://wiki.archiveteam.org/index.php/Software_Heritage https://archive.softwareheritage.org/save/


After initial save, do they perform automatic git pulls? What happens if there are potential conflicts? I wonder how it all works behind the surface. I know I ran into issues with "git pull --all" before, for example. Or what if it is public software that is not mine? I saved some git repositories (should I do .tar.gz too for the same project? Does it know anything about versions?).


On Software Heritage: for forges (GitLab, cgit etc), every couple of months SWH lists all repos, pulls new/updated ones. I think if you save an individual repo, it gets pulled later too, but I'm not sure of the schedule. They have custom tooling (open source) for doing the importing of repos, tarballs and other things. They deduplicate on the backend, so if you cloned some repos then the files/commits that are shared between them are saved once. They import the git tags (and other refs) too.

ArchiveTeam Codearchiver is quite a bit different, it does one-shot archiving of repos into VCS-native export formats, like git bundles. There is some deduplication based on commit hashes I think.


[flagged]


Thanks, appreciate it. I would hope so. I do not care about down-votes per se, my main complaint is really the fact that I am somehow in the wrong for doing what I deem is right for my shitty server(s).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: