> Configuration of SRAM based FPGAs is rather slow because it requires a scan ch...

aseipp · on Jan 5, 2024

Modern GPUs hide latency by scheduling tons of work and paying it back in throughput but this is very design sensitive and if you do it in an FPGA requires a ton of pipelining and design work. Which is often better spent just paying some schlubs like us to write software. Again, the cost of programming the fabric is quite real. You pay for area.

And people do actually create marketable FPGA designs you can load into modern accelerators. You can buy Bittware devices yesterday, or Xilinx Alveo and load tons of designs into them. You can go get Amazon F1 instances and put tons of accelerators on them. You don't hear about them and they aren't popular like GPUs because the fact is that most people don't need this, and the ones who do have very particular designs that probably aren't worth over-optimizing the entire system architecture for. That's why they're 95% PCIe cards with attached output peripherials that most of the time end up in Ethernet.

jlokier · on Jan 9, 2024

I'm familiar with Bittware, F1 and Alveo accelerators. I've used F1 and might use Alveo this year. The cost of programming them is indeed high, but it's largely because of the design software whose paradigm remains firmly stuck in the 90s. Even "high level synthesis" is far from high level at the moment.

Those devices are completely different to use compared to the sort of general purpose, fast-compilation, fast-switching accelerators like modern GPUs.

FPGAs and FPGA-like architectures and concommittant design software can be designed for fast compilation, adaptive timing and pipelining, and overlapped application multiplexing. But it takes significant design changes. It's a novel and underexplored area. With such architectures, schlubs like us can write software that runs on them with excellent performance for some tasks.

Unfortunately the market and the legal situation hasn't optimised for that. The closed FPGA programming information, for decades, meant others could't produce radically different commercial tools for existing FPGAs, which would generally require skipping the proprietary P&R to use novel fast-compilation and incremental reprogramming techniques. Those who explored it were always worried about legal issues, as well as damaging customer devices.

And for a long time the patents were a chilling effect on new entrants wanting to develop alterate FPGA architectures better suited to this type of programming, as long as they contained elements of traditional FPGAs as well. The patent situation is starting to shift now that early Xilinx and Altera devices are old enough, but it's a multi-decade process, unfortunately.

mips_r4300i · on Jan 5, 2024

That's exactly what Tabula did. They made time multiplexed FPGA fabric. They also went bankrupt, you could buy their scrap on eBay for a few months

jlokier · on Jan 9, 2024

I liked their idea, and I think it had a lot of potential for clever optimisation. Shame about the bankrupcy. But it's different to what I'm talking about. Tabula's extremely fast multipexing needed a lot of chip area.