Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stamping Out Overflow Checks in Ruby (chrisseaton.com)
57 points by shaicoleman on Sept 26, 2021 | hide | past | favorite | 23 comments


It's an interesting article but I am not a fan of some of the self contradiction. On the one hand the article states that jo operations are no big deal, the CPU can pipeline it and execute it cheaply, but on the other hand, removing them results in code that runs twice as fast.

Another detail I think is important is that injecting the stamp is not the same as writing a proof and informing the compiler that some property is true. You're doing what other languages would call an unsafe operation and asserting that something is true. Providing a proof would be interesting and some languages do have the ability to express such proofs (a static type system is one way, among others).

That said, it's still interesting to see details about how to get a dynamic language like Ruby to extract some static properties from the source code. I did some work on this at Microsoft when they were working on IronPython, it's definitely challenging stuff. At the time the consensus was that the pay-off wasn't worth it, it ends up being a lot of work to extract very small guarantees and that one is almost certainly better off off-loading performance related code to C# (or C in the case of CPython). But that was 15 years ago, it's nice to see progress being made.


Yes the cost of the jo is not what people seem to conventionally think and teach. I don’t have an explanation yet. Maybe the scheduling is much better without it as it’s trying to run the overflow checks as fixed control flow otherwise?

I was also expecting to be writing about how removing overflow checks enables multiple arithmetic operations to be selected into single LEA instructions using addressing modes, as that’s the other common wisdom, but I couldn’t get the instruction selector to do that. Don’t know why yet either.

And the injection isn’t the proof - the mask was the proof. The injection is to see if we can do it without the actual mask instruction if we cared about that. You’re right it’s unsafe - I have an example of the failure mode of abusing it in the article. Turns out we didn’t need it in this case anyway as it’s the same performance.


> On the one hand the article states that jo operations are no big deal

It states that as the conventional wisdom, very explicitely, and before going any further already states that it will hinder further optimisations (because the arithmetics operations can’t be merged / fused).


Re the unexpected jo effect, it's covered but left a mystery:

> Didn’t we say that jo didn’t add much of an impact? Well that’s what I was always taught - maybe I should reconsider that and measure it a bit more.


I've really been enjoying the posts from Chris unpacking the TruffleRuby optimizations. I've been hearing about Graal and Truffle for years, but didn't understand exactly what made them so fast.

Paired with my current reading of Crafting Interpreters, these posts make me really excited for all the cool things you can do when implementing a language.


> Stamps are a kind of micro-types that exist just within the compiler. Like all types they represent information about values, but unlike types you’ve usually worked with, stamps can give us a variety of properties, such as whether an object can be null or not, or whether a value is positive or negative, or even which individual bits are set in an integer.

...so, they're just types - they define sets of values. Not quite sure why one would have to call it "a kind of micro-types". Especially when it's not even clear what a "micro-type" should be.


Well, at the very least, Ruby already has a type system so it's convenient to call these something different.

And while you're correct that the stamps appear to be an alternative type system, for most people these types will be "unlike types you’ve usually worked with", because most people aren't heavily into Haskell or ML or other languages where it's natural to construct complex type systems like this.


> And while you're correct that the stamps appear to be an alternative type system, for most people these types will be "unlike types you’ve usually worked with", because most people aren't heavily into Haskell or ML or other languages where it's natural to construct complex type systems like this.

That ultimately causes more confusion though. These really are nothing more or less than types and you can do this kind of thing in a plain old type system and it's actually a lot easier and more effective. Maybe if enough people hear about types we can eventually drag programming culture kicking and screaming into the 1980s.


Coming from traditional statically typed languages (Java, C++, Typescript, even Python or Ruby...), you might think of a type like "int" or "Float". He's calling stamps "micro-types" because they're more fine-grained than that. A stamp might tag a value as being between 0 and 255, for example, which is much more specific than "int".

It would be really cool to work with a language that allows the user such fine-grained types, which is one reason I keep eyeing Idris.


> A stamp might tag a value as being between 0 and 255, for example, which is much more specific than "int".

Yeah, that's a normal type. For example in Common Lisp, it's the type defined by the type specifier (INTEGER 0 255).


A simple range is one application you could use stamps for, but they can also represent that individual bits may or may not be set, or could even represent that individual bits are set with a given probability, which could inform scheduling decisions.


Those things can in principle be expressed with a predicate using the SATISFIES type specifier, but the problem with that is that you currently can't pick such types apart for optimization purposes in any existing implementation, only test membership for values. Presumably that's one of the reasons why Qi/Shen was created.


TypeScript can... sort of do that. They could definitely take it further if they wanted to

For example a type could be `0 | 1 | 2 | 3 | 4 | 5`, though there's no way right now to specify that as `0..5`


Ada can do that


They’re non-nominal - you can’t name them in the language being compiled. They work at a finer level than the nominal system the programmer has access to.

(I’m not a type theorist though.)


Hmm. I usually see "nominal" in this context as "matched on basis on type name", as opposed to "matched on basis of isomorphism" ("structural").


Right and you can’t match something by name if you can’t name it.


That's not really true -- in structural typing, you can name things but the names themselves are just a convenience; an alias for something too long to type without getting lost in it.


Well I don't know then. If you've got a better name for a type that you cannot express in the domain language but can be tracked by the compiler that'll be understandable in a public blog post then I'll use it?


Well...that's still just a type. I never found any reason to name it specially myself, even when I was thinking of similar things that are being described in the article (though not about Ruby, but rather about Lisp-y languages). Maybe I'm a bit influenced by the fact that there isn't such thing as "a type that you cannot express in the domain language" if you have SATISFIES (http://www.lispworks.com/documentation/HyperSpec/Body/t_sati...), but still - a compiler can certainly synthesize "ad-hoc", unnamed types and perform reasoning with them.


> Well...that's still just a type.

By this argument I might as well call everything a 'thing' since everything is still just a thing.

Different names for subclasses of things that already have names is useful.

I get this discussion with people who don't like the word 'transpiler' because they reason it's just another compiler. Yeah nobody's denying that - it's a subclass of compiler with a specific name.


To be fair, "lispy" thinking promoted the notion of closure property in me - just like a list of lists is still a list, a type composed of other types, be it by conjunction or by disjunction, is still a type, and its components are types as well. Uniform treatment seems more logical specifically for programming it into a machine since it eliminates special cases. So I'm tempted to disagree with the assertion that I may need special cases for this.


The hoops we jump through, on both ends, to get dynamic languages performing well enough

Great article though, really interesting look into how the sausage gets made that most of us don't get to see very often




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: