Gentoo Linux Drops IA-64 (Itanium) Support – Gentoo Linux

jsheard · on Aug 18, 2024

Man, they're going to look foolish when the Sufficiently Smart Compiler arrives. Any day now, surely.

garganzol · on Aug 18, 2024

It is not that Sufficiently Smart Compiler arrives or not. The problem is that VLIW architectures are a moving target - you can only really optimize for one specific chip. The next iteration of the same architecture brings a totally different superposition of performance considerations, thus rendering the previous optimization strategies inefficient.

This is the Achilles Heel of any VLIW architecture. A Sufficiently Smart Compiler gets outdated with a new chip revision. The previously compiled binary files that worked fast on a previous revision of the architecture, start to work slowly on newer chips.

Y_Y · on Aug 18, 2024

Unless the architecture is also an input. "Given the following C code, emit assembly for an architecture with the following characteristics."

Why not? (Apart from the difficulty of writing a sufficiently general sufficiently smart compiler)

I'm now imagining a world where Itanium took the place of RISC-V and we had a new generation of custom chips based on it.

BoingBoomTschak · on Aug 18, 2024

Now I wonder if such a compiler could exist in theory. I think VLIW vs "on-the-fly reordering to extract ILP" is similar to AOT vs JIT, in that there may be runtime exclusive information that's crucial to go the last mile (and x86 might process µops instead). But PGO does exist and could work similarly in both cases to bridge the gap, no?

Note that I vaguely remember having read somewhere that EPIC isn't "true" VLIW (whatever that means), unlike what you can find in some camera SoCs (e.g. Fujitsu FR-V).

Pet_Ant · on Aug 18, 2024

> Note that I vaguely remember having read somewhere that EPIC isn't "true" VLIW

Well IIRC, the amount of execution units presented architecturally are not actually reflected to what is available internally. This was done to allow them to increase the number units under the hood without breaking backward compatibility (or is it forward compatibility in this case). At which point, you still are going to need scheduling hardware and all that jazz.

That said, to my limited understanding, all processors are internally VLIW, it's just hidden behind a decoder & scheduler that exposes are more limited ISA so that they don't have to make the trade-off Itanium did.

That said, I really wonder if it's an issue of compiler was too complicated to bootstrap one good enough to get the ecosystem going, or if it was a truly brainded evolutionary fork. Anyone seem any good hand optimised benchmarks to see the potential of the paradigm?

Btw, it looks like someone started adding IA64 support to QEMU: https://github.com/amarioguy/qemu-itanium

rbanffy · on Aug 18, 2024

> Btw, it looks like someone started adding IA64 support to QEMU: https://github.com/amarioguy/qemu-itanium

It's good to have that available. Students can examine actual software for hardware that'll soon be recycled.

bri3d · on Aug 18, 2024

The big issue with VLIW (or VLIW-adjacent) architecture for practical modern use cases is preemptive multitasking or multi tenancy. As soon as any assumptions about cache residency break, the whole thing crumbles to dust. That’s why VLIW is good for DSP, where branches are more predictable but more importantly you know exactly what inputs will be in cache already.

rbanffy · on Aug 18, 2024

> That’s why VLIW is good for DSP

Maybe GPU usage as well. I know very little about what the ISA of GPUs looks like though.

magicalhippo · on Aug 18, 2024

AMDs GPUs did indeed use a VLIW ISA[1].

They moved to SIMD. Not sure what they're doing in their latest as I haven't been paying attention lately.

[1]: https://www.anandtech.com/show/4455/amds-graphics-core-next-...

colejohnson66 · on Aug 18, 2024

Even then, the ISA of GPUs is typically hidden behind the driver which recompiles shaders on-the-fly.

rbanffy · on Aug 18, 2024

I'm sure it could exist in theory, but VLIW for these large chips has been outcompeted by OTF reordering and SMT, which are capable of extracting almost as much work from the processor as an ideal VLIW instruction flow for a lot less effort.

okanat · on Aug 18, 2024

Sufficiently Smart Compilers™ exist, but they are called GPU drivers and GPGPU compute platforms.

garganzol · on Aug 18, 2024

Sufficiently Smart Compilers are also called Superscalar CPU Architectures when they are implemented in hardware and come on a chip.

duskwuff · on Aug 18, 2024

And those can succeed where the Sufficiently Smart Compilers failed because:

1) They can adapt to variable-latency memory operations, because they aren't an ahead-of-time compiler.

2) They can adapt to the capabilities of the hardware they're running on, because they ARE the hardware.

cameldrv · on Aug 18, 2024

Yeah I think this is the biggest issue. Even if you know precisely the configuration of the target computer, you can’t know if there’s going to be a cache miss. A conventional CPU can reorder instructions at runtime to keep the pipeline full, but a VLIW chip can’t do this.

In reality of course you don’t even know the precise configuration of the computer, and you don’t know the exact usage pattern of the software. Even if you do profile guided optimization, someone could use the software with different data that causes different branch patterns than in the profile, and then it runs slow. A branch predictor will notice this at runtime and compensate automatically.

greggsy · on Aug 18, 2024

So they’re smart but kind of clumsy?

vandahm · on Aug 18, 2024

Aren't we all?

worik · on Aug 18, 2024

> Aren't we all?

Well, I'm clumsy...

jmclnx · on Aug 18, 2024

>Following the removal of IA-64 (Itanium) support in the Linux kernel and glibc

Looks like they had no choice.

Seems even NetBSD does not 100% support IA64:

https://www.netbsd.org/ports/ia64/index.html

nubinetwork · on Aug 18, 2024

Itanium is dead. Nobody even makes an emulator for it. Just let it die already...

badgersnake · on Aug 18, 2024

It’s a funny one because it’s not like it ever really took off. Architectures like m68k are also probably pretty dead, but there’s a ton of these chips out there in embedded or retro kit and you can probably find one to test on if you need it.

Findecanor · on Aug 18, 2024

There are also newer M68K cores designed by retro-enthusiasts to run in FPGA, either as emulator or as CPU accelerator in vintage computers. An example, the Apollo 68080: http://apollo-computer.com/apollo68080.php

duskwuff · on Aug 18, 2024

The Amiga crowd is probably the main reason m68k is still supported at all. It's a neat architecture but basically all of the hardware is out of production and has specifications which are untenable for a modern Linux system (<50 MHz, <256 MB RAM).

rbanffy · on Aug 18, 2024

It was never a big lineup as well. For any generation there were only a few models.

Y_Y · on Aug 18, 2024

Of course someone made an emulator! For example: https://github.com/itanium64/Rosalia64

(I don't know what pleasure someone can get out of declaring something "dead", but it's clearly a popular hobby.)

nubinetwork · on Aug 18, 2024

I was referring to a system emulator like qemu... that repo seems to only run windows exe files?

arittr · on Aug 19, 2024

Idk why but this makes me sad (and feel kind of old). Maybe now I can finally afford one