Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Restic 0.13.0 (restic.net)
111 points by soheilpro on March 27, 2022 | hide | past | favorite | 66 comments


I have a hand-coded backup system for my photo library that writes to S3. It runs every night at 2AM.

The one feature I have that's important to me is this: it will figure out what files need to be uploaded and then upload as many as possible for an hour then stop.

That means that it runs for at-most an hour a night.

The reason I need/wanted this feature is that I might come home from a trip with (eg) 30G worth of photos. My (cable) internet will upload at around 1G an hour. I don't want this thing to saturate my internet for 30 hours straight. Instead, it backs up a small amount every night for 30 days.

Am I the only one that wants a feature like this? I've never seen it in any other backup system. (At alternative might be to have configurable bandwidth for uploads.)


You could do this with the `timeout` command.

`timeout -s SIGINT 1h restic...`

That would let restic run for one hour, and then once the hour elapses send a SIGINT which will stop the process (see https://github.com/restic/restic/blob/a29777f46794ea4e35548f...)


And restic would then resume the backup next time it runs.

[1] https://restic.readthedocs.io/en/latest/faq.html#will-restic...


Makes perfect sense. Restic kind-of supports this because you can just kill the client after an hour and, tomorrow, it'll see which objects are there already.

I'm not deep enough into the project to know whether this is like an officially supported use-case, but restic was of course made with the idea that interruptions can happen (your computer can crash) and should be handled safely, and for the deduplication it'll cut files up in a deterministic way and thus (as I understand it) store those chunks in a deterministic place.


Rclone will do exactly what you want, upload to S3 and the --max-duration will stop new transfer from starting after a given duration.

There are also throttle options for bandwidth. I use that combined with Node-Red and a smart plug on my monitors, if monitor power draw exceeds a threshold then the upload throttle is changed via the rclone API.


Ok, that looks like a nice tool. Thanks!


Was just going to suggest rclone as well, but for its easily toggle-able bandwidth limiter. I have slow rural internet; restic backs up locally every night, rclone then syncs it offsite. A systemd timer throttles it during walking hours and lets all 6 mbps rip overnight or when out of town.


Also, Restic has rclone backend!


My internet upload speed is bad so I do want something like that.

I would also like to be able to "stage" a backup: figure out what needs to be transmitted and then create the data files that need to be transmitted without actually immediately transmitting it.

That would allow me to do things like backup my laptop to another computer in my house that can upload the files over my slow connection overnight when my laptop isn't on; and to let me bring the backup files to a place (work/university/library) with a fast connection so large backups don't take days or weeks (especially initial backup).


Take a look at NNCP[0]. Haven't tried it yet but read several articles about it and I am looking forward to finding time to use it seriously.

[0] https://www.complete.org/nncp/


I am using restic to backup my laptop and workstation to my nas. At night rclone syncs the restic repositories to S3. I can restore both from my nas as well as from s3.


On Linux, this can be easily achieved by using RuntimeMaxSec= [1] in the corresponding service unit.

[1] https://www.freedesktop.org/software/systemd/man/systemd.ser...


I wonder if you can obtain that behavior with a bash script, but it's boring to write scripts and I do not know if SIGTERM can exit Restic gracely.


If you've ever ctrl+c'd restic, you'll know the message

> signal interrupt received, cleaning up


Thanks! Will try tonight to play with Restic


Restic has support for rate limiting built-in.


Unless it's for experimenting, I've stopped caring for backup solutions other than borg and zfs as the only way to prove their stability is to have them exist for a while without big complaints and new ones all seem to have complaints.

Just having no data loss isn't enough which is the absolute base point but huge memory consumption and other operational issues are also showstoppers.


Restic in my experience has been rock solid. I actually switched from Borg. Borg’s crypto has known limitations; its Python error messages are long and messy; it complained more frequently.

Restic’s repository format is simple and well documented, which is important for long term data recovery (and fixes in case changes occur in the repo). The crypto is from a good source, and well regarded. Multithreaded, fast, nice and clean output.

ZFS is a file system, and has serious limitation when used as a backup tool. It needs a ZFS backend, ruling out almost any provider (basically self host your ZFS system, which is costly and error prone). It needs more RAM than Borg and restic. And I personally feel uncomfortable with native encryption in ZFS til some time. Lower level system encryption is probably not what you want in backups.

One feature I miss from these tools (other than ZFS): error correction. They could use a Reed Solomon code or similar and add parities in case there is an accidental change in the repository.


> ZFS is a file system, and has serious limitation when used as a backup tool.

But it's definitely the best as it knows everything that happened on the filesystem unlike any other tools that require entire scanning of the repo directories on every run.

You can read how zfs performs better than a rsync run.

https://arstechnica.com/information-technology/2015/12/rsync...

Also, you can even take database backup as a file system snapshot which is far easier than any other database backup which isn't always simple.

> basically self host your ZFS system, which is costly and error prone

How is this so? Just run Ubuntu, install zfs userland and it works or just use rsync.net (not affiliated but can't find a better service that accepts zfs send). Don't try on RH based distro as its support is pretty bad on zfs.


I run ZFS on my systems including with Ubuntu (the support has recently come out of experimental, in Ubuntu 22.04, and it’s pretty good with Zsys). It’s a superb file system, but to use ZFS send, you need: a RAID server (ideally with ECC RAM), and another mirror in a different place for replication. So, two TrueNAS servers, for instance. It costs in hardware and electricity, and sysadmin maintenance time.

Restic and Borg are portable. You can send to, and recover from, any cloud provider for cheap.

There is rsync.net for ZFS, it requires minimum 1TB and it’s still more expensive than alternatives (because they have to assign to you RAM and some CPU too).

If you have a ZFS backup system, use that. Ordinary people might be better off with a tool that works with any cloud storage.


Just to get started, all you need is a 2GB Ubuntu instance that is $10/mo or pick rsync.net with zfs capability which gives you 1TB for $25/mo.

If you want easy fast backup, even hobbyist shouldn't overlook zfs.

Besides, Borg doesn't work with any cloud providers but only can go against ssh servers. You can also use Hetzner than rsync.net for this.


Didn't encounter any problems on Linux... but on Windows I ran into an issue the first time I used it (might me by personal bad luck though).

If you're using restic on Windows, maybe take a minute to check https://github.com/restic/restic/pull/3637 - it was my first contribution to restic and it seems to have stalled, but I'm not sure if it's a general "oh, windows" problem no one takes seriously in a project that 90% of the maintainers use on Linux like so very often :P


About "Borg’s crypto has known limitations":

We've just merged new crypto code into master, based on AEAD ciphers (AES-OCB and chacha20-poly1305) and session keys - so the potential nonce management issues are soon a thing of the past.

There's current work adding argon2id as default for the KDF (was: pbkdf2), likely soon to be merged.

Also checking blake3 for the ID hash (MAC) right now (but platform / build compatibility has yet to be seen).


Is there anything that must be done to take advantage of these new AEAD ciphers on an existing repo/archive ? Or is it all under the hood and transparent to the end user?


It is still early in the development process, but it might be just for new repos.


Very welcoming improvements, especially the introduction of the standard AEADs from a standard source.

I look forward to these crypto features!

Will error correction coding be on roadmap at some point?


There's this ticket: https://github.com/borgbackup/borg/issues/225

I don't see anything of that in the near future of borg, esp. as long as the fundamental concerns there have not been addressed.


The "long and messy" python error messages are python tracebacks and often intentionally displayed by borg to ease locating and fixing bugs.

Of course we could also display shorter error messages (and we do that at some places, if the cause of the exception is well known / expected), but be glad to have the long form and not just "something went wrong" (which is very pretty, but completely useless). :-)


> Restic in my experience has been rock solid. I actually switched from Borg

How large is your repo? How stable is the memory usage?


Up to 1TB. I recall a very recent post in restic forums from someone with a 100 TB repo!

No issue with RAM usage. Also Restic runs much faster since 1.12, RAM usage is now more limited in time duration.

I don’t have first hand experience with larger repositories.


Borg being single-threaded is painful in the era of consumer 12 and 16 core CPUs, and even prosumer 64-core.


Don't notice this much.... disk is usually the bottleneck, and otherwise it will be the network to the remote backup location. Still backups complete in seconds:

Repository: ssh://backup/./backups/mungedhostname.borg Archive name: 20220327-2201 Archive fingerprint: 8b710144579c8d531e7c4a0192304323081b14a71445557608d859494bbe84b6 Time (start): Sun, 2022-03-27 22:01:35 Time (end): Sun, 2022-03-27 22:01:52 Duration: 17.28 seconds Number of files: 45678

Original size Compressed size Deduplicated size This archive: 24.22 GB 9.35 GB 47.78 MB All archives: 363.60 GB 140.91 GB 7.81 GB

Unique chunks Total chunks Chunk index: 59857 852257


Sure, for spinning disks, but deduplication, compression, and encryption can easily be the bottleneck if you have some NVMEs.


And you can turn the compression way up to where it is very expensive but gives much better space savings.


I agree that a multithreaded borg could utilise resources better, esp. IF you have a lot of changed / new data to back up.

But OTOH, for many users this is primarily the case for the first backup, but not for their daily / hourly backups when most files are unchanged - for that, I guess speed is I/O bound and all your CPU cores won't help you with that.

So for N-1 backup runs, it just works good enough already for many users. And for that 1 initial backup, some patience helps. :-)

Implementing MT is planned since long, but due to the above, other stuff had higher priority.


You could spread the backup into several isolated borg repos and run in parallel if CPU is the bottleneck.


I like that it runs in the background without heating my laptop much.


Where can I read about the best solutions for backups? For nerdy people


Restic and BorgBackup really seem to be the favored solutions out there. Restic for encryption, Borg for deduplication and compression. Or maybe bacula if you want pull based backups instead of push based.

https://restic.readthedocs.io/en/stable/

https://www.borgbackup.org/

https://www.bacula.org/documentation/documentation/


BorgBackup can be made to operate in pull mode too, see: https://borgbackup.readthedocs.io/en/stable/deployment/pull-...

At work I've implemented a variant of the ssh-agent method described there. Admittedly, it requires some scripting :)


'borg'[1] has, in recent years, become the de facto standard for secure, encrypted, you-control-the-keys backups. It has been referred to as "the holy grail of backups"[2].

Two of the better howtos that we have seen for borg are [3][4]. [4] is geared toward OpenBSD users.

[1] https://borgbackup.readthedocs.io/en/stable/

[2] https://www.stavros.io/posts/holy-grail-backups/

[3] https://jstaf.github.io/2018/03/12/backups-with-borg-rsync.h...

[4] https://rgz.ee/borg.html


There is also https://github.com/restic/others which has some keywords (e.g. is it encrypted, does it do compression) for most FOSS backup solutions. It can be outdated or incomplete for some entries, though.


When talking about backups for nerds, https://www.rsync.net/ always deserves a mention.


... as well as the http://taobackup.com/ .


Besides Homebrew, restic is also available via MacPorts: https://ports.macports.org/port/restic/


Still no compression?



I have to say, this is an excellent solution and I am seriously contemplating on deploying it for all of our servers.

I'm a little unclear on one thing, are alternative S3 providers supported?


These seem to be the supported backends

    Local directory
    sftp server (via SSH)
    HTTP REST server (protocol, rest-server)
    Amazon S3 (either from Amazon or using the Minio server)
    OpenStack Swift
    BackBlaze B2
    Microsoft Azure Blob Storage
    Google Cloud Storage
    And many other services via the rclone Backend

https://github.com/restic/restic#backends=


I saw that, I was specifically wondering about different regular S3 implementations like Linode object storage, Digitalocean spaces, Wasabi, etc


Did you try your back end to see if it works? It's not that hard to download the executable and try it.


Yes, just now. Here's the piece of information I was looking for - rclone supports S3


Yes! I back up directly into Backblaze B2 using restic.


Excellent. One more question. Assuming someone gains access to a node - will they be able to access the backups or screw with them in any way?


Depends which back-end you use and which configuration. (Assuming "node" here means a backup client and not a backup-hosting server.)

If you just put stuff on some standard storage (FTP, B2, etc.) without any permissions set up, then no backup client could stop the attacker from using the credentials you've deployed to the backup client to login to your storage and delete your backup files. This is not really specific to restic.

Accessing old files from a backup client is a weakness. Though, personally, I don't store things on a system that this system is not supposed to know, so if someone compromises this system and they see data from the past year... that's nearly the same as just seeing the data that is on this system today. Nevertheless, this could be solved by using public keys, so the backup clients has an encryption but no decryption key (of course it's not as trivial as this sounds, e.g. how could it still do deduplication etc.), but restic does not do this.


Got it. So I guess the best solution to go forward here would be to deploy Minio and then backup the backup directory itself in regular intervals onto different storage not accessible from anywhere.


There's an object lock feature which prevents objects in a certain bucket from being deleted before X amount of time has elapsed. This might be able to prevent backups from being tampered with.

If that doesn't work, there's a restic server which can act as an intermediary and which has an append-only mode which would prevent backups from being tampered with if your computer were compromised.


I'm so glad to finally see the --dry-run option in Restic.


Any thoughts on how this compares to duplicacy?


Duplicacy has no mount!


…or Duplicity?


Both restic and borg are lightyears ahead of duplicity.... backups are soo much faster (and smaller), restores are much easier.


Duplicacy is much faster than Restic for backups and. That was was the overwhelming consensus when I was researching backup software.

I use Duplicacy and Rclone, each for different types of data.


For someone using Kopia, is there any major advantage in switching to restic?


I am using restic and thinking about switching to Kopia... Mainly because Kopia has compression and seems to have more activity in development. It also has gui. And from what i've seen is faster.


This point hides a lot of goodness in something that I didn't even understand on the first read:

> - We have added checksums for various backends so data uploaded to a backend can be checked there.

All data is already stored in files with as filename the sha256sum of the contents, so clearly it's all already checksummed and can be verified right?

Looking into the changelog entry[1], this is about verifying the integrity upon uploading:

> The verification works by informing the backend about the expected hash of the uploaded file. The backend then verifies the upload and thereby rules out any data corruption during upload. \n\n [...] besides integrity checking for uploads [this] also means that restic can now be used to store backups in S3 buckets which have Object Lock enabled.

Object lock is mentioned in passing somewhere down the changelog, but it's a big feature. S3 docs:

> Object Lock can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely.

i.e. ransomware protection. Good luck wiping backups if your file hoster refuses to overwrite or delete the files. And you know Amazon didn't mess with the files because they're authenticated.

Extortion is still a thing, but if people would use this, it more-or-less wipes out the attack vector of ransomware. The only risk is if the attacker is in your systems long enough to outlast your retention period and creates useless backups in the meantime so you're not tipped off. Did anyone say "test your backups"?

For self-hosting, restic has a custom back-end called rest-server[2] which supports a so-called "append-only mode" (no overwriting or deleting). I worked on the docs for this[3] together with rawtaz and MichaelEischer to make this more secure, because eventually, of course, your disks are full or you want to stop paying for outdated snapshots on S3, and an attacker could have added dummy backups to fool your automatic removal script into thinking it needs to leave only the dummy backups. Using the right retention options, this attack cannot happen.

Others are doing some pretty cool stuff in the backup sphere as well, e.g. bupstash[4] has public key encryption so you don't need to have the decryption keys as a backup client.

[1] https://github.com/restic/restic/releases/v0.13.0

[2] https://github.com/restic/rest-server/

[3] https://restic.readthedocs.io/en/latest/060_forget.html#secu...

[4] https://github.com/andrewchambers/bupstash/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: