Posix hardlink heartache

wmanley · on July 6, 2021

See also: Ghosts of Unix past, part 4: High-maintenance designs: https://lwn.net/Articles/416494/

> While hard links are certainly a lesser evil than setuid, and there is little motivation to rid ourselves of them, they do serve to illustrate how a seemingly clever and useful design can have a range of side effects which can weigh heavily against the value that the design tries to bring.

dpbriggs · on July 7, 2021

The comment section on that article is quite the time capsule.

bodhiandphysics · on July 6, 2021

This seems to me to be a bit of throwing out the baby with the bathwater... the problem isn’t links but rather setuid programs changing file permissions in user writable directories!

eqvinox · on July 6, 2021

I don't see how the security issues described in this article are really tied to hardlinks. If root is doing chmod/chown in a directory that is writable by untrusted users, the same untrusted users can also just remove or rename files. Is there any example that demonstrates an exploit specifically relying on hardlinks?

cristoperb · on July 6, 2021

The author links to one example at the top of the article ("CVE-2017-18188: opentmpfiles root privilege escalation via recursive chown"): http://michael.orlitzky.com/cves/cve-2017-18188.xhtml

Apparently a user can create a hardlink to a sensitive root-owned file (like /etc/shadow) in a user-writable directory where they know a privileged process (in this case tmpfiles.d) will come along and chown it to the user, after which that user will own the sensitive file too.

eqvinox · on July 7, 2021

Thanks a lot for the clarification! I hadn't quite puzzled this together in my head.

It's really counterintuitive that creating a hardlink is allowed based solely on the permissions of the directory it is created in. I think I was expecting another permission check based on the directory the file is sitting in, since that's gating unlink() and rename().

It also means that, any file you can "see", if you have write access somewhere on the same filesystem you can prevent it from being deleted[¹] — even if you can't read it. That feels like a possible privacy issue…

[¹] If the user trying to delete file is aware of the problem, they can truncate it to 0 bytes, but that's not what a plain rm does (because rm is also the tool to remove hardlinks…)

Negitivefrags · on July 6, 2021

Couldn’t they introduce the same security feature mentioned for symlinks?

As in, make it so by default you can’t create a hard link to a file you don’t already have write access to?

cristoperb · on July 6, 2021

Linux does (since version 3.6) have the ability to prevent users from creating hardlinks to files they don't own. (See man 5 proc under "/proc/sys/fs/protected_hardlinks".) I think FreeBSD has a similar sysctl option.

The linked article does mention it but warns "If you're not using systemd, the vanilla Linux kernel does not enable these protections by default".

tyingq · on July 6, 2021

>Couldn’t they introduce the same security feature mentioned for symlinks?

"The tmpfiles.d specification for the Z type more or less implies some kind of recursive chown. The spec heads off one type of vulnerability by saying that symlinks should not be followed; however, hard links are still a problem"

https://github.com/OpenRC/opentmpfiles/issues/3

>As in, make it so by default you can’t create a hard link to a file you don’t already have write access to?

From the CVE: "when the fs.protected_hardlinks sysctl is turned off"

A description of that: "When set to “1” hardlinks cannot be created by users if they do not already own the source file, or do not have read/write access to it."

gunapologist99 · on July 6, 2021

From the linked article, "If you're not using systemd, the vanilla Linux kernel does not enable these protections by default"

... which just seems a bit discriminatory.

tyingq · on July 6, 2021

Torvalds did weigh in on it. It's disabled by default because it apparently breaks some legacy stuff.

https://github.com/torvalds/linux/commit/561ec64ae67ef25cac8...

gunapologist99 · on July 7, 2021

.. which apparently now won't work under systemd either!

IMO, he was wrong on this; it should have been enabled by default, and then the people who need that exceptionally rare legacy stuff can disable it with the same techniques (/proc, initrd) that he is currently suggesting to enable it.

dfox · on July 6, 2021

You can link a file that is outside of that directory and thus get write access to that file. (Canonical example is linking /etc/passwd somewhere where you expect root to do chown -R you .)

tedunangst · on July 6, 2021

The usual defense is to keep user writable spaces on separate mount points, where in theory they may be able to link with each others' files, but not anything important. And then be mindful about whatever dumb script you run that mucks with permissions.

viraptor · on July 6, 2021

"others' files, but not anything important" reminded me of https://xkcd.com/1200/ - user files are pretty much the only important thing in many scenarios.

tedunangst · on July 6, 2021

Users have a trivial defense: set permissions to prevent others from seeing the files. System files tend to live in world readable directories.

deckard1 · on July 6, 2021

I'd be curious to know what use case people have today for hardlinks, ever since symlinks became a thing.

I've been using Linux for more than 20 years and the only case I've found is for rsync incremental backups (--link-dest option), which is great for doing backups to an external USB hard drive and saving space. But that's rather niche.

eqvinox · on July 7, 2021

Standard use case for hardlinks is replacing a file atomically while creating a backup. The steps you do are:

1. create, write, close/sync "file.new"

2. hardlink "file" to "file.old" and sync again

3. rename "file.new" to "file"

4. sync to finish

With these steps, regardless of where you are interrupted, you always have a "functioning" copy of the file. This is quite important for tools writing configuration files or state to disk, but also just your plain old office suite preventing total loss of your PhD thesis.

(Yes it's not perfect, sync()ing correctly is hard [you need to sync the files and the directory!], and hard filesystem damage/kernel panics can always break things.)

gunapologist99 · on July 6, 2021

not so niche. That's a really terrific use for them. Hardlinks are hampered compared to symlinks because they can't be used for directories (which seems just a really silly limitation). It would be awesome to be able to roll up and mirror entire directory trees.

caf · on July 6, 2021

(which seems just a really silly limitation)

It's the most straightforward way to ensure the directory structure remains a directed acyclic graph.

viraptor · on July 6, 2021

With newer systems this can be often handled at the os level. Both xfs and btrfs support copy-on-write, so you can have the same effect just by doing "cp --reflink".

gunapologist99 · on July 6, 2021

Sure, that will do a CoW copy, but it's not actually the same effect at all, is it? In other words, writing to the copy will not propagate the writes to the original. (or am I wrong on that)

Besides, won't this walk and copy the entire directory subtree? (i.e., CoW only applied to the files themselves, not just directory entries.)

I'm saying I'd like to just have a hard link to the directory itself; if the directory is effectively a list of inode pointers, I just want to add another pointer to that list of pointers. Then, if I create or modify a new file in some deep nested directory in either copy, it will be instantly available in both or more locations without any metadata changes on disk to more than one copy of the metadata itself. This would be interesting for certain use cases.

viraptor · on July 6, 2021

I mentioned the CoW in the OP context of reflinks being useful for incremental backups. In that case you do want the "copy will not propagate the writes to the original" behaviour.

mjevans · on July 6, 2021

Sometimes multiple writable entries for a single object are desirable.

As an advanced feature there's plenty of room for abuse and footgun problems, but it's still occasionally, maybe even rarely, the correct tool for the job.

bodhiandpysics1 · on July 7, 2021

What's more important are multiple readable entries for an object! And for those objects not to disapear unless the ref count on them is zero. The quintessential example is a library... By linking it into a directory the program has acccess to, you expose the library to the program. By using a hard link, you protect that library from being erased by another user

caspper69 · on July 7, 2021

Why would you just not so this in code? Why require the whole os at the scripting level require this?

singron · on July 6, 2021

Reflinks on CoW filesystems are also good for that (along with native features like snapshotting, which are sort of a reflinks of a directory).

anyfoo · on July 6, 2021

git uses them for local clones.

admax88q · on July 7, 2021

Does multi user posix really get much use still? And should it? The model is how old now and we're still finding vulnerabilities more or less by design. Computers are so cheap that almost everyone has one in their pocket, and most in the first world own 2-3. Multi user operating systems just don't seem relevant anymore.

jimrandomh · on July 7, 2021

Yes. One use case is application sandboxing; the users aren't separate people, but separate programs. Eg on Android, each app is a "user", and Linux filesystem permissions are used to control what the apps can do. But there are also a few instances where you still see the different-users-ssh-into-the-same-server model; for example that server might be controlling a cluster, and have users ssh'ing in to do otherwise-slow computations.

admax88q · on July 8, 2021

Using users to sandbox processes is a hack that has never worked well and is only successful when combined with lots of other sandbox technologies such as jails, SELinix or AppArmor.

tristan957 · on July 7, 2021

Yes. At my work, I have my own user that has access to every machine in our lab. On my own personal computer, I am the only user however.

tryauuum · on July 7, 2021

Yes. Virtual (shared) hosting is still a thing

bloak · on July 6, 2021

So, which systems allow hard links to directories?

eqvinox · on July 6, 2021

~None. It breaks too many assumptions when you can have the file system tree loop around on itself.~

Edit: WTF, apparently it's a core Apple Time Capsule feature. Okay. Cool. Never touching that.

Edit²: https://arstechnica.com/gadgets/2011/07/mac-os-x-10-7/12/ claims that hard links are actually hidden files(?) on HFS+:

> One particularly scary example is the implementation of hard links on HFS+. To keep track of hard links, HFS+ creates a separate file for each hard link inside a hidden directory at the root level of the volume. Hidden directories are kind of creepy to begin with, but the real scare comes when you remember that Time Machine is implemented using hard links to avoid unnecessary data duplication.

dfox · on July 6, 2021

HFS+ is still only somewhat extended HFS and the filesystem itself does not support hardlinks to anything. The support for hardlinks is simulated in upper layers which while being questionable design on its own somewhat alleviates the usual problems with directory hardlinks.

dfox · on July 6, 2021

There is a difference between “allows hardlinks to directories to exists” and “allows user to straighforwardly create them”. As for existence most unix filesystems have exactly zero issue with that, while on systems that allow you to link(2) to directory you have to be root to do that.

Hardlinks to directories not only break bunch of userspace expectations, but more importantly create a structure that cannot be unlink(2)ed (unlink(2) does not work on directories and rmdir(2) works only for directories with st_nlink==2). So even on systems that allow creation of that it is restricted to root.

themadsens · on July 6, 2021

Every (posix) directory has "." and ".." which are both hardlinks to a directory ..

LeoPanthera · on July 6, 2021

macOS allows it as of 10.5, but it is not exposed to the user. The "Time Machine" backup tool built into macOS relies on it.

AT&T UNIX System V allowed it for the root user only.

masklinn · on July 6, 2021

> macOS allows it as of 10.5, but it is not exposed to the user.

Depends what you mean by "not exposed to the user": while ln(1) can't create directory hardlinks it works fine via link(2) (on HFS+), with the limitation that hardlinked directories can't be siblings.

And of course directory hardlinks are fucking terrifying because `unlink(1)` will not work, and `rm(1)` will recursively remove directory contents, so you need to go through `unlink(2)` in C.

masklinn · on July 6, 2021

macOS.

IIRC ln(1) can't create directory hardlinks, but link(2) allows it (on HFS+, directory hardlinks were not ported over to APFS).

Directory hardlink support was originally added for time machine.

tryauuum · on July 6, 2021

good news is that fs.protected_hardlinks is enabled on debian and ubuntu

jra_samba · on July 6, 2021

No so much harlinks, but symlinks are a blight on the POSIX filesystem design. They have caused endless pain and suffering and so many, many CVE's. They need to be eliminated.

matheusmoreira · on July 6, 2021

They are so useful though.

zokula · on July 8, 2021

A filesystem without symlinks is a crippled and dead filesystem.

tryauuum · on July 6, 2021

why can't they both be bad things? :)

I once googled if there is a mount option to disable symlink creation, unfortunately there is no.

lifeisstillgood · on July 6, 2021

I am trying to work out the level of (useless?/unnecessary?) churn in the world of startups / digital transformation / world.

So, yes the Internet is great - it connects what 5 billion adults now, and allows faster finding of the things you want etc. But there is soooo much ... of this stuff. I am guessing that "Digital marketing for the Rental market" means you have a house to let, and you want to list it with these people and their five competitors because you might miss out because who knows where one's audience really looks.

Now we could talk about disaggregation of AiBnB as a positive thing, but really - no, lets not.

What we can talk about is there is a bare minimum of cost / effort we can imagine here. Call it a Craigslist for the whole internet. Want to sell something - just find the right RDF tuple and list it. A search engine can find it and anyone searching for "house to rent in London" or "new pair trainers" will have a complete JSON list to walk through - sortable by price, location, availability etc etc.

Now this is not something I think should exist, but if it did it would still have a cost to operate. But we could measure the unnecessary churn by comparing the actual cost (in people, dollars, time etc) of things like RentPath to this bare minimum.

I expect there are Economics PhDs on this, but it struck me as interesting.