I have a hand-coded backup system for my photo library that writes to S3. It runs every night at 2AM.
The one feature I have that's important to me is this: it will figure out what files need to be uploaded and then upload as many as possible for an hour then stop.
That means that it runs for at-most an hour a night.
The reason I need/wanted this feature is that I might come home from a trip with (eg) 30G worth of photos. My (cable) internet will upload at around 1G an hour. I don't want this thing to saturate my internet for 30 hours straight. Instead, it backs up a small amount every night for 30 days.
Am I the only one that wants a feature like this? I've never seen it in any other backup system. (At alternative might be to have configurable bandwidth for uploads.)
Makes perfect sense. Restic kind-of supports this because you can just kill the client after an hour and, tomorrow, it'll see which objects are there already.
I'm not deep enough into the project to know whether this is like an officially supported use-case, but restic was of course made with the idea that interruptions can happen (your computer can crash) and should be handled safely, and for the deduplication it'll cut files up in a deterministic way and thus (as I understand it) store those chunks in a deterministic place.
Rclone will do exactly what you want, upload to S3 and the --max-duration will stop new transfer from starting after a given duration.
There are also throttle options for bandwidth. I use that combined with Node-Red and a smart plug on my monitors, if monitor power draw exceeds a threshold then the upload throttle is changed via the rclone API.
Was just going to suggest rclone as well, but for its easily toggle-able bandwidth limiter. I have slow rural internet; restic backs up locally every night, rclone then syncs it offsite. A systemd timer throttles it during walking hours and lets all 6 mbps rip overnight or when out of town.
My internet upload speed is bad so I do want something like that.
I would also like to be able to "stage" a backup: figure out what needs to be transmitted and then create the data files that need to be transmitted without actually immediately transmitting it.
That would allow me to do things like backup my laptop to another computer in my house that can upload the files over my slow connection overnight when my laptop isn't on; and to let me bring the backup files to a place (work/university/library) with a fast connection so large backups don't take days or weeks (especially initial backup).
I am using restic to backup my laptop and workstation to my nas. At night rclone syncs the restic repositories to S3. I can restore both from my nas as well as from s3.
Unless it's for experimenting, I've stopped caring for backup solutions other than borg and zfs as the only way to prove their stability is to have them exist for a while without big complaints and new ones all seem to have complaints.
Just having no data loss isn't enough which is the absolute base point but huge memory consumption and other operational issues are also showstoppers.
Restic in my experience has been rock solid. I actually switched from Borg. Borg’s crypto has known limitations; its Python error messages are long and messy; it complained more frequently.
Restic’s repository format is simple and well documented, which is important for long term data recovery (and fixes in case changes occur in the repo). The crypto is from a good source, and well regarded. Multithreaded, fast, nice and clean output.
ZFS is a file system, and has serious limitation when used as a backup tool. It needs a ZFS backend, ruling out almost any provider (basically self host your ZFS system, which is costly and error prone). It needs more RAM than Borg and restic. And I personally feel uncomfortable with native encryption in ZFS til some time. Lower level system encryption is probably not what you want in backups.
One feature I miss from these tools (other than ZFS): error correction. They could use a Reed Solomon code or similar and add parities in case there is an accidental change in the repository.
> ZFS is a file system, and has serious limitation when used as a backup tool.
But it's definitely the best as it knows everything that happened on the filesystem unlike any other tools that require entire scanning of the repo directories on every run.
You can read how zfs performs better than a rsync run.
Also, you can even take database backup as a file system snapshot which is far easier than any other database backup which isn't always simple.
> basically self host your ZFS system, which is costly and error prone
How is this so? Just run Ubuntu, install zfs userland and it works or just use rsync.net (not affiliated but can't find a better service that accepts zfs send). Don't try on RH based distro as its support is pretty bad on zfs.
I run ZFS on my systems including with Ubuntu (the support has recently come out of experimental, in Ubuntu 22.04, and it’s pretty good with Zsys). It’s a superb file system, but to use ZFS send, you need: a RAID server (ideally with ECC RAM), and another mirror in a different place for replication. So, two TrueNAS servers, for instance. It costs in hardware and electricity, and sysadmin maintenance time.
Restic and Borg are portable. You can send to, and recover from, any cloud provider for cheap.
There is rsync.net for ZFS, it requires minimum 1TB and it’s still more expensive than alternatives (because they have to assign to you RAM and some CPU too).
If you have a ZFS backup system, use that. Ordinary people might be better off with a tool that works with any cloud storage.
Didn't encounter any problems on Linux... but on Windows I ran into an issue the first time I used it (might me by personal bad luck though).
If you're using restic on Windows, maybe take a minute to check https://github.com/restic/restic/pull/3637 - it was my first contribution to restic and it seems to have stalled, but I'm not sure if it's a general "oh, windows" problem no one takes seriously in a project that 90% of the maintainers use on Linux like so very often :P
We've just merged new crypto code into master, based on AEAD ciphers (AES-OCB and chacha20-poly1305) and session keys - so the potential nonce management issues are soon a thing of the past.
There's current work adding argon2id as default for the KDF (was: pbkdf2), likely soon to be merged.
Also checking blake3 for the ID hash (MAC) right now (but platform / build compatibility has yet to be seen).
Is there anything that must be done to take advantage of these new AEAD ciphers on an existing repo/archive ? Or is it all under the hood and transparent to the end user?
The "long and messy" python error messages are python tracebacks and often intentionally displayed by borg to ease locating and fixing bugs.
Of course we could also display shorter error messages (and we do that at some places, if the cause of the exception is well known / expected), but be glad to have the long form and not just "something went wrong" (which is very pretty, but completely useless). :-)
Don't notice this much.... disk is usually the bottleneck, and otherwise it will be the network to the remote backup location. Still backups complete in seconds:
Repository: ssh://backup/./backups/mungedhostname.borg
Archive name: 20220327-2201
Archive fingerprint: 8b710144579c8d531e7c4a0192304323081b14a71445557608d859494bbe84b6
Time (start): Sun, 2022-03-27 22:01:35
Time (end): Sun, 2022-03-27 22:01:52
Duration: 17.28 seconds
Number of files: 45678
Original size Compressed size Deduplicated size
This archive: 24.22 GB 9.35 GB 47.78 MB
All archives: 363.60 GB 140.91 GB 7.81 GB
Unique chunks Total chunks
Chunk index: 59857 852257
I agree that a multithreaded borg could utilise resources better, esp. IF you have a lot of changed / new data to back up.
But OTOH, for many users this is primarily the case for the first backup, but not for their daily / hourly backups when most files are unchanged - for that, I guess speed is I/O bound and all your CPU cores won't help you with that.
So for N-1 backup runs, it just works good enough already for many users. And for that 1 initial backup, some patience helps. :-)
Implementing MT is planned since long, but due to the above, other stuff had higher priority.
Restic and BorgBackup really seem to be the favored solutions out there. Restic for encryption, Borg for deduplication and compression. Or maybe bacula if you want pull based backups instead of push based.
'borg'[1] has, in recent years, become the de facto standard for secure, encrypted, you-control-the-keys backups. It has been referred to as "the holy grail of backups"[2].
Two of the better howtos that we have seen for borg are [3][4]. [4] is geared toward OpenBSD users.
There is also https://github.com/restic/others which has some keywords (e.g. is it encrypted, does it do compression) for most FOSS backup solutions. It can be outdated or incomplete for some entries, though.
Local directory
sftp server (via SSH)
HTTP REST server (protocol, rest-server)
Amazon S3 (either from Amazon or using the Minio server)
OpenStack Swift
BackBlaze B2
Microsoft Azure Blob Storage
Google Cloud Storage
And many other services via the rclone Backend
Depends which back-end you use and which configuration. (Assuming "node" here means a backup client and not a backup-hosting server.)
If you just put stuff on some standard storage (FTP, B2, etc.) without any permissions set up, then no backup client could stop the attacker from using the credentials you've deployed to the backup client to login to your storage and delete your backup files. This is not really specific to restic.
Accessing old files from a backup client is a weakness. Though, personally, I don't store things on a system that this system is not supposed to know, so if someone compromises this system and they see data from the past year... that's nearly the same as just seeing the data that is on this system today. Nevertheless, this could be solved by using public keys, so the backup clients has an encryption but no decryption key (of course it's not as trivial as this sounds, e.g. how could it still do deduplication etc.), but restic does not do this.
Got it. So I guess the best solution to go forward here would be to deploy Minio and then backup the backup directory itself in regular intervals onto different storage not accessible from anywhere.
There's an object lock feature which prevents objects in a certain bucket from being deleted before X amount of time has elapsed. This might be able to prevent backups from being tampered with.
If that doesn't work, there's a restic server which can act as an intermediary and which has an append-only mode which would prevent backups from being tampered with if your computer were compromised.
I am using restic and thinking about switching to Kopia... Mainly because Kopia has compression and seems to have more activity in development. It also has gui. And from what i've seen is faster.
This point hides a lot of goodness in something that I didn't even understand on the first read:
> - We have added checksums for various backends so data uploaded to a backend can be checked there.
All data is already stored in files with as filename the sha256sum of the contents, so clearly it's all already checksummed and can be verified right?
Looking into the changelog entry[1], this is about verifying the integrity upon uploading:
> The verification works by informing the backend about the expected hash of the uploaded file. The backend then verifies the upload and thereby rules out any data corruption during upload. \n\n [...] besides integrity checking for uploads [this] also means that restic can now be used to store backups in S3 buckets which have Object Lock enabled.
Object lock is mentioned in passing somewhere down the changelog, but it's a big feature. S3 docs:
> Object Lock can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.
i.e. ransomware protection. Good luck wiping backups if your file hoster refuses to overwrite or delete the files. And you know Amazon didn't mess with the files because they're authenticated.
Extortion is still a thing, but if people would use this, it more-or-less wipes out the attack vector of ransomware. The only risk is if the attacker is in your systems long enough to outlast your retention period and creates useless backups in the meantime so you're not tipped off. Did anyone say "test your backups"?
For self-hosting, restic has a custom back-end called rest-server[2] which supports a so-called "append-only mode" (no overwriting or deleting). I worked on the docs for this[3] together with rawtaz and MichaelEischer to make this more secure, because eventually, of course, your disks are full or you want to stop paying for outdated snapshots on S3, and an attacker could have added dummy backups to fool your automatic removal script into thinking it needs to leave only the dummy backups. Using the right retention options, this attack cannot happen.
Others are doing some pretty cool stuff in the backup sphere as well, e.g. bupstash[4] has public key encryption so you don't need to have the decryption keys as a backup client.
The one feature I have that's important to me is this: it will figure out what files need to be uploaded and then upload as many as possible for an hour then stop.
That means that it runs for at-most an hour a night.
The reason I need/wanted this feature is that I might come home from a trip with (eg) 30G worth of photos. My (cable) internet will upload at around 1G an hour. I don't want this thing to saturate my internet for 30 hours straight. Instead, it backs up a small amount every night for 30 days.
Am I the only one that wants a feature like this? I've never seen it in any other backup system. (At alternative might be to have configurable bandwidth for uploads.)