What if we did what others suggested was the practical limit - 48GB. Then just put 2-3 cards in the system and maybe had a little bridge over a separate bus for them to communicate?
I believe that would need some software work from Intel where they're lacking a bit now with their delayed start. Not sure how the frameworks themselves split up the inference work to avoid crossing GPUs as the bandwidth is horrible there.
If we're being reasonable and say that you're not using a modern HEDT CPU that costs a couple thousand, the best a consumer botherboard can get right now would be 2x 8x PCIe gen 5 at 32GB/s and one chipset x8 PCIe gen 4 at 16GB/s. I'm not sure if a motherboard like that actually exists but Intel's chipset should allow it; AMD only does x4 to chipset so the third slot is limited by that