> My survey of bytecode instruction sets in https://dercuano.github.io/notes/tin...

kragen · 2025-09-08T09:33:30 1757324010

I'm interested to hear what you used SWEET16 for.

cm.push and cm.popret seem like very appealing instructions from this point of view, yes! They look more expressive than ldm/stm. I imagine they would hurt interrupt latency? How do they affect page fault handling in practice?

If you are going to add non-RISC features to your ISA to ensmallen prologues and epilogues, even at the expense of interrupt latency, another possibility is to add semantics to the call and return instructions. Perhaps you could describe the required activation record structure in a header word before the subroutine entry point that need not follow the usual ISA encoding at all, and if the return instruction can reliably find the same header word (on the stack, say, or in a callee-saved register like lr) there is no need for it to contain a redundant copy of the same information.

The MSP430 sounds very appealing in some ways, and Mecrisp-Stellaris is a usable self-licking development environment for it. Too bad it's 16-bit. How small is the smallest FPGA implementation? OpenMSP430 says it's 1650 6-LUTs.

I had no idea about the Spike hash table approach. It's a totally new approach to emulation to me. It makes a lot of sense especially in the RISC-V context where EEs threw the instruction word layout in a blender to reduce their propagation delays—a decision I endorse despite my initial horror.

brucehoult · 2025-09-08T13:03:02 1757336582

> I'm interested to hear what you used SWEET16 for.

Just generic coding that needed 16 bit calculations but wasn't speed-critical. Pointer-chasing code is particularly tiresome on 6502.

> cm.push and cm.popret seem like very appealing instructions from this point of view, yes! They look more expressive than ldm/stm. I imagine they would hurt interrupt latency?

They are designed to be easily interruptible / restartable with a variety of possible implementations (non-interruptible, complete restart, continue where you left off). The tradeoff is up to the designer of the individual CPU. A basic idea is to hold them in the decoder and emit normal instructions to the execution pipeline, while simplifying the original instruction towards a base case. A correct (interrupt-safe) sequence of normal instructions to generate is given in the specification.

Also, that code didn't show other instructions to copy `a0` and `a1` to two arbitrary `sN` registers, or to copy two arbitrary `sN` registers to `a0` and `a1`. This helps a lot both implementing and calling functions with two or more arguments, if you're not using arithmetic to marshal the arguments anyway.

> How do they affect page fault handling in practice?

It it unlikely that anyone will ever make a machine with both Zcmp (and Zcmt) and page faults. Architects of high performance CPUs would have a fit, and besides which they use the same opcode space as RVC instructions for floating point load/store, making them incompatible with standard Linux software.

> Perhaps you could describe the required activation record structure in a header word before the subroutine entry point that need not follow the usual ISA encoding at all, and if the return instruction can reliably find the same header word (on the stack, say, or in a callee-saved register like lr) there is no need for it to contain a redundant copy of the same information.

VAX, begone!

As Patterson showed, JSR subroutines outperform CALLS/G subroutines.

kragen · 2025-09-08T15:06:09 1757343969

I see, thanks!

Yes, VAX CALLS/CALLG is pretty much what I'm talking about, but with arguments in registers.

I think the performance landscape has changed somewhat since Patterson; if your instruction decoder is shattering each instruction into a dependency graph of separately executable micro-ops anyway to support OoO, you don't have to slow down your whole pipeline to support elaborate instructions like cm.push or CALLG. As I see it, though, a couple of things haven't changed:

- Having arguments in registers is faster than having arguments in memory.

- Page fault handling is unfriendly to architectural instructions having multiple effects, because the page fault can't be handled at the micro-op level; it has to be handled at the architectural level. But maybe this is a smaller problem with OoO and especially speculative execution, I don't know.

If you can safely get by with a complete restart of the instruction, though, the page fault problem goes away.

I think that solves the problems Patterson identified?