Where the Dd1-3 blocks are just wasted. Meaning by adding a new disk to the array you're only expanding free storage by 25%... So say you have 8TB disks for a total of 24TB of storage free originally, and you have 4TB free before expansion, you would have 5TB free after expansion.
Please tell me I've misunderstood this, because to me it is a pretty useless implementation if I haven't.
ZFS RAID-Z does not have parity disks. The parity and data is interleaved to allow data reads to be done from all disks rather than just the data disks.
Anyway, you are not entirely wrong. The old data will have the old parity:data ratio while new data will have the new parity:data ratio. As old data is freed from the vdev, new writes will use the new parity:data ratio. You can speed this up by doing send/receive, or by deleting all snapshots and then rewriting the files in place. This has the caveat that reflinks will not survive the operation, such that if you used reflinks to deduplicate storage, you will find the deduplication effect is gone afterward.
To be fair, RAID5/6 don't have parity disks either. RAID2, RAID3, and RAID4 do, but they're all effectively dead technology for good reason.
I think it's easy for a lot of people to conceptualize RAID5/6 and RAID-Zn as having "data disks" and "parity disks" to wrap around the complicated topic of how it works, but all of them truly interleave and compute parity data across all disks, allowing any single disk to die.
I've been of two minds on the persistent myth of "parity disks" but I usually ignore it, because it's a convenient lie to understand your data is safe, at least. It's also a little bit the same way that raidz1 and raidz2 are sometimes talked about as "RAID5" and "RAID6"; the effective benefits are the same, but the implementation is totally different.
Unless I misunderstood you, you're describing more how classical RAID would work. The RAID-Z expansion works like you note you would logically expect. You added a drive with four blocks of free space, and you end up with four blocks more of free space afterwards.
You can see this in the presentation[1] slides[2].
The reason this is sub-optimal post-expansion is because, in your example, the old maximal stripe width is lower than the post-expansion maximal stripe width.
Your example is a bit unfortunate in terms of allocated blocks vs layout, but if we tweak it slightly, then
Your diagrams have some flaws too. ZFS has a variable stripe size. Let’s say you have a 10 disk raid-z2 vdev that is ashift=12 for 4K columns. If you have a 4K file, 1 data block and 2 parity blocks will be written. Even if you expand the raid-z vdev, there is no savings to be had from the new data:parity ratio. Now, let’s assume that you have a 72K file. Here, you have 18 data blocks and 6 parity blocks. You would benefit from rewriting this to use the new data:parity ratio. In this case, you would only need 4 parity blocks. ZFS does not rewrite it as part of the expansion, however.
There are already good diagrams in your links, so I will refrain from drawing my own with ASCII. Also, ZFS will vary which columns get parity, which is why the slides you linked have the parity at pseudo-random locations. It was not a quirk of the slide’s author. The data is really laid out that way.
What are the errors? I tried to show exactly what you talk about.
edit: ok, I didn't consider the exact locations of the parity, I was only concerned with space usage.
The 8 data blocks need three stripes on a 3+2 RAID-Z2 setup both pre and post expansion, the last being a partial stripe, but when written in the 4+2 setup only needs 2 full stripes, leading to more total free space.
Please tell me I've misunderstood this, because to me it is a pretty useless implementation if I haven't.