Looks complicated to reason about, but really interesting.
Freeforth implements a similar mechanism to eliminate swaps via register renaming, obv. the focus is different in that Freeforth tries to still be an optimised stack, rather than a VLIW processor.
I guess you could look at this paper as a vector processor, similar to the way APLs handle things with function composition. In that, you'd need the compiler to take care of things as managing it manually would be a nightmare.
I guess the best way forward would be a specialized (forth) word lib which invokes the compiler to manage the top N stack cells for the duration of the word.