In my experience, btrfs just doesn't seem to be very resilient to hardware fault...

In my experience, btrfs just doesn't seem to be very resilient to hardware faults. Everything works great as long as you stay on the golden path, but when you fall off that path, it gets into a confused state and things start going very wrong and there is no way to recover (short of wiping the whole filesystem, because fsck doesn't fix the faults).

So yes, if you are Facebook, and put it on a rock-solid block layer, then it will probably work fine.

But outside of the world of hyperscalers, we don't have rock solid block layers. [1] Consumer drives occasionally do weird things and silently corrupt data. And on top of drives, nobody uses ECC memory and occasionally weird bit flips will corrupt data/metadata before it's even written to the disk.

At this point, I don't even trust btrfs on a single device. But the more disks you add to a btrfs array, the more likely you are to encounter a drive that's a little flaky.

And Btrfs's "best feature" really doesn't help it here, because it encourages users to throw a large number of smaller cheap/old spinning drives at it. Which is just going to increase the chance of btrfs encountering a flaky drive. The people who are willing to spend more money on a matched set of big drives are more likely to choose zfs.

The other paradox is that btrfs ends up in a weird spot where it's good enough to actually detect silent data corruption errors (unlike ext4/xfs and friends where you never find out your data was corrupted), but then it's metadata is complex and large enough that it seems to be extra vulnerable to those issues.

---------------

[1] No, mdadm doesn't count as a rock-solid block layer, it still depends on the drives to report a data error. If there is silent corruption, madam just forwards it. I did look into using a synology style btrfs on mdadm setup, but I searched and found more than a few stories from people who's synology filesystem borked itself.

In fact, you might actually be worse off with btrfs+mdadm, because now data integrity is done at a completely different layer to data redundancy, and they don't talk to each other.