"`dpkg` calls `fsync()` on all the unpacked files" Why in the world does it do t...

scottlamb · 2025-03-28T21:36:54 1743197814

Let's start from the assumption that dpkg shouldn't commit to its database that package X is installed/updated until all the on-disk files reflect that. Then if the operation fails and you try again later (on next boot or whatever) it knows to check that package's state and try rolling forward.

> If you can't trust the kernel to close() then you can't trust it to fsync() or anything else either.

https://man7.org/linux/man-pages/man2/close.2.html

       A successful close does not guarantee that the data has been
       successfully saved to disk, as the kernel uses the buffer cache to
       defer writes.  Typically, filesystems do not flush buffers when a
       file is closed.  If you need to be sure that the data is
       physically stored on the underlying disk, use fsync(2).  (It will
       depend on the disk hardware at this point.)

So if you want to wait until it's been saved to disk, you have to do an fsync first. If you even just want to know if it succeeded or failed, you have to do an fsync first.

Of course none of this matters much on an ephemeral Github Actions VM. There's no "on next boot or whatever". So this is one environment where it makes sense to bypass all this careful durability work that I'd normally be totally behind. It seems reasonable enough to say it's reached the page cache, it should continue being visible in the current boot, and tomorrow will never come.

Brian_K_White · 2025-03-28T22:02:35 1743199355

Writing to disk is none of your business unless you are the kernel itself.

scottlamb · 2025-03-28T22:15:20 1743200120

Huh? Are you thinking dpkg is sending NVMe commands itself or something? No, that's not what that manpage means. dpkg is asking the kernel to write stuff to the page cache, and then asking for a guarantee that the data will continue to exist on next boot. The second half is what fsync does. Without fsync returning success, this is not guaranteed at all. And if you don't understand the point of the guarantee after reading my previous comment...and this is your way of saying so...further conversation will be pointless...

duped · 2025-03-28T19:17:37 1743189457

"durability" isn't voodoo. Consider if dpkg updates libc.so and then you yank the power cord before the page cache is flushed to disk, or you're on a laptop and the battery dies.

timewizard · 2025-03-28T20:35:27 1743194127

> before the page cache is flushed to disk

And if yank the cord before the package is fully unpacked? Wouldn't that just be the same problem? Solving that problem involves simply unpacking to a temporary location first, verifying all the files were extracted correctly, and then renaming them into existence. Which actually solves both problems.

Package management is stuck in a 1990s idea of "efficiency" which is entirely unwarranted. I have more than enough hard drive space to install the distribution several times over. Stop trying to be clever.

hiciu · 2025-03-28T21:25:37 1743197137

> Wouldn't that just be the same problem?

Not the same problem, it's half-written file vs half of the files in older version.

> Which actually solves both problems.

it does not and you would have to guarantee that multiple rename operations are executed in a transaction. Which you can't. Unless you have really fancy filesystem.

> Stop trying to be clever.

It's called being correct and reliable.

timewizard · 2025-03-28T21:45:29 1743198329

> have to guarantee that multiple rename operations are executed in a transaction. Which you can't. Unless you have really fancy filesystem

Not strictly. You have to guarantee that after reboot you rollback any partial package operations. This is what a filesystem journal does anyways. So it would be one fsync() per package and not one per every file in the package. The failure mode implies a reboot must occur.

> It's called being correct and reliable.

There are multiple ways to achieve this. There are different requirements among different systems which is the whole point of this post. And your version of "correct and reliable" depends on /when/ I pull the plug. So you're paying a huge price to shift the problem from one side of the line to the other in what is not clearly a useful or pragmatic way.

hiciu · 2025-03-29T02:50:34 1743216634

> You have to guarantee that after reboot you rollback any partial package operations.

In both scenarios, yes. This is what dpkg database is for, it keeps info about state of each package: whatever is it installed, unpacked, configured and so on. It is required to handle interrupted update scenario, no matter if it was interrupted during package unpacking or in the configuration stage.

So far you are just describing --force-unsafe-io from dpkg. It is called unsafe because you can end up with zeroed or 0-length files long after the package has been marked as installed.

> This is what a filesystem journal does anyways.

This is incorrect. And also filesystem journal is irrelevant.

Filesystem journal protects you from interrupted writes on the disk layer. You set some flag, you write to some temporary space called journal, you set another flag, then you copy that data to your primary space, then you remove both flags. If something happens during that process you'll know and you'll be able to recover because you know in which step you were interrupted.

Without filesystem journal every power outage could result in not being able to mount the filesystem. Journal prevents that scenario. This has nothing to do with package managers, page cache or fsync.

Under Linux you do the whole write() + fsync() + rename() dance, for every file, because this is the only way you don't end up in the scenario where you've written the new file, renamed it, marked package as installed and fsynced the package manager database but the actual new file contents never left the page cache and now you have bunch of zeroes on the disk. You have to fsync(). This is semantic of the layer you are working with. No fsync(), no guarantee that data is on the disk. Even if you wrote it and closed the file hours ago. And fsynced package manager database.

> There are different requirements among different systems which is the whole point of this post.

Sorry, I was under assumption that this thread is about dpkg and fsync and your idea of "solving the problem". I just wanted to point out that, no, package managers are not "trying to be clever" and are not "stuck in the 1990s". You can't throw fsync() out of the equation, reorder bunch of steps and call this "a solution".

Brian_K_White · 2025-03-28T19:35:53 1743190553

Like I said.

levkk · 2025-03-28T20:37:37 1743194257

Pretty sure kernel doesn't have to fsync on close. In fact, you don't want it to, otherwise you're limiting the performance of your page cache. So fsync on install for dpkg makes perfect sense.

Brian_K_White · 2025-03-28T20:46:23 1743194783

I didn't say it synced. The file is simply "written" and available at that point.

It makes no sense to trust that fsync() does what it promises but not that close() does what it promises. close() promises that when close() returns, the data is stored and some other process may open() and find all of it verbatim. And that's all you care about or have any business caring about unless you are the kernel yourself.

hiciu · 2025-03-28T21:28:48 1743197328

I would like to introduce you to a few case studies on bugzilla:

https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Why_is_dpkg_so_slo...

switch007 · 2025-03-28T21:45:44 1743198344

Get involved, tell and show them you know better. They have a bug tracker, mailing list etc

fulafel · 2025-03-29T06:12:35 1743228755

> Kernel-level crashes, the only kind of crash that risks half-written files [...]

You can get half-written files in many other circumstances, eg on power outages, storage failures, hw caused crashes, dirty shutdowns, and filesystem corruption/bugs.

(Nitpick: trusting the kernel to close() doesn't have anythign to do with this, like a sibling comment says)

Brian_K_White · 2025-03-30T15:49:07 1743349747

A power outage or other hardware fault or kernel crash can happen at any unpredicted time, equally just before or just after any particular action, including an fsync().

Trusting close() does not mean that the data is written all the way to disk. You don't care if or when it's all the way written to disk during dpkg ops any more than at any of the other infinite seconds of run time that aren't a dpkg op.

close() just means that any other thing that expects to use that data may do so. And you can absolutely bank on that. And if you think you can't bank on that, that means you don't trust the kernel to be keeping track of file handles, and if you don't trust the kernel to close(), then why do you trust it to fsync()?

A power rug pull does not worsen this. That can happen at any time, and there is nothing special about dpkg.

hiciu · 2025-03-28T21:02:48 1743195768

this is actually an interesting topic and it turns out kernel never made any promises about close() and data being on the disk :)

and about kernel-level crashes: yes, but you see, dpkg creates a new file on the disk, makes sure it is written correctly with fsync() and then calls rename() (or something like that) to atomically replace old file with new one.

So there is never a possibility of given file being corrupt during update.

switch007 · 2025-03-28T21:41:34 1743198094

What's your experience developing a package manager for one of the world's most popular Linux distributions?

Maybe they know something you don't ?????

inetknght · 2025-03-28T20:14:12 1743192852

> Kernel-level crashes, the only kind of crash that risks half-written files, are no more likely during dpkg than any other time. A bad update is the same bad update regardless, no better, no worse.

Imagine this scenario; you're writing a CI pipeline:

1. You write some script to `apt-get install` blah blah

2. As soon as the script is done, your CI job finishes.

3. Your job is finished, so the VM is powered off.

4. The hypervisor hits the power button but, oops, the VM still had dirty disk cache/pending writes.

The hypervisor may immediately pull the power (chaos monkey style; developers don't have patience), in which case those writes are now corrupted. Or, it may use ACPI shutdown which then should also have an ultimate timeout before pulling power (otherwise stalled IO might prevent resources from ever being cleaned up).

If you rely on sync to occur at step 4 during the kernel to gracefully exit, how long does the kernel wait before it decides that some shutdown-timeout occurred? How long does the hypervisor wait and is it longer than the kernel would wait? Are you even sure that the VM shutdown command you're sending is the graceful one?

How would you fsync at step 3?

For step 2, perhaps you might have an exit script that calls `fsync`.

For step 1, perhaps you might call `fsync` after `apt-get install` is done.

Brian_K_White · 2025-03-28T20:53:30 1743195210

This is like people that think they have designed their own even better encryption algorithm. Voodoo. You are not solving a problem better than the kernel and filesystem (and hypervisor in this case) has already done. If you are not writing the kernel or a driver or bootloader itself, then fsync() is not your problem or responsibility and you aren't filling any holes left by anyone else. You're just rubbing the belly of the budda statue for good luck to feel better.

inetknght · 2025-03-28T21:49:36 1743198576

You didn't answer any of the questions proposed by the outlined scenario.

Until you answer how you've solved the "I want to make sure my data is written to disk before the hypervisor powers off the virtual machine at the end of the successful run" problem, then I claim that this is absolutely not voodoo.

I suggest you actually read the documentation of all of these things before you start claiming that `fsync()` is exclusively the purpose of kernel, driver, or bootloader developers.