Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> “short string optimization”: A short enough string can be stored “in place” [...] An optimization that’s impossible in Rust, by the way ;).

Author is not aware of https://docs.rs/compact_str/latest/compact_str/ or https://github.com/bodil/smartstring



This is false. The polars api has used smart string for a long time.

https://github.com/pola-rs/polars/blob/32a2325b55f9bce81d019...


That documentation talks about all the benefits and "can mostly be used as a drop in replacement for String", but what are the tradeoffs? When cannot it be used?


It looks like there's a bunch of "garbage collection" type activity where you may have some bytes which were once part of a string but aren't now used, and you're always paying for the overhead of this optimisation even if it's useless for your problem.

Suppose you work only with 500-4000 byte strings, maybe they're short reviews, and each ends with a rating in star emoji, ***** is the best * is the worst. [[HN ate my star emoji of course]]

So your reviews never fit in the "optimised" string slot, but also the prefix is just opening words from a review, which in some review styles will be the start of seemingly unrelated anecdotes. "My grandfather used to tell me" it'll get to the review eventually, and you'll see why they're connected, but the suffix is useful and that's not stored in a "German string" data structure.

Or maybe you have a high turnover of somewhat related medium size strings, so then that garbage collection step costs quite a lot of overhead.


Any code built closely around String's power-of-2 reallocation pattern may have to be reworked. I don't think there's any case when it cannot be used as a String replacement at all, except maybe when interfacing with an API that expects a &mut String as an output parameter.


My uninformed guess is that at least it will cost you the branching because you need to check if it is inlined or not, and you pay that for every string. Branch prediction is likely very good for this case though.


Was this section removed? I'm not seeing it in the linked post.


It seems to be a quote from https://cedardb.com/blog/german_strings/ which is about this German Strings type (implemented in Polars)

But yeah, it's pretty ignorant to assume Rust can't do this since the best available examples (as with many things) are in Rust. CompactString is really nice. On a typical modern (64-bit) computer CompactString takes 24 bytes and holds up to 24 bytes of UTF-8 text inline, while also having a niche.

I guess the confusion arises because C++ people tend to assume that anywhere Rust differs from the practice in the C++ community it's a mistake, even though that's often because C++ made the wrong choice? Rust's &str is "just" &[u8] plus a rule about the meaning of these bytes, and Rust's String is correspondingly "just" Vec<u8> plus a rule about the meaning of those bytes. C++ couldn't have done the former because it only belatedly got the fat pointer slice reference (as the troubled std::span) years after having a string data type.

Rust didn't do this in the stdlib, but not because it's impossible, because it's a trade off and they wanted the provided stdlib type to be straightforward. If you need or even just want the trade off, you can just cargo add compact_str


I remember some C++ colleagues raving on about the standard library having anything and everything they might ever need. Makes sense in a world without sane package managers and package registries, but that mindset just doesn't carry over.


A decent standard library very much makes sense even with sane package managers and registries. Just look at JS. It's awful that you need to hunt for packages for simple stuff (or implement yourself). Stdlib is usually straightforward to use, good quality, trustworthy and good and lasting support


The "good and lasting support" part is the most important IMHO. Nothing is more annoying than having to switch to another library because the one you are currently using goes unmaintained. This can happen in the "stdlib" too (e.g. PHP deprecating the mcrypt library), but it happens much less often, and typically with much more time to prepare. Also important is that, when there is a "standard" stdlib package, other dependencies you might use will probably use that - having two dependencies that use different packages for doing the same thing is also annoying.


I attribute Go's success to a rich OOTB standard library. You can build so much before you reach for third party packages. Heck, for web services, the built-in + their templating gets you pretty far.


That's a bit silly, the C++ standard library is pretty small ("the intersection of your needs, not the union" is what they say), and it's often not even really good at the few things that it does. Whether that's core stuff like hashmaps and iostreams, or more niche stuff like std::regex.


That development is quite recent. The stdlib used to be very barebones. It still is compared to for example the Python one.


> I guess the confusion arises because C++ people tend to assume that anywhere Rust differs from the practice in the C++ community it's a mistake, even though that's often because C++ made the wrong choice?

Funny, I hear that a lot from the Rust folks.


Why is it impossible in Rust? Do you have any source for that?


It is not. It is not implemented in std::string::String, but (as pointed out elsewhere in this thread) there are other string implementations that have it.

It was decided explicitly against for the standard library, because not every optimization is universally good, and keeping String as a thin wrapper over Vec<T> is a good default.


This is one of those things that I wish people would speak more carefully about. I've seen it in every programming language community I've participated in, so it's not a language-specific thing, but... one should not say "language X does not do a thing" when they mean "language X's standard library does not do a thing". That the "language" doesn't do a thing should be reserved for the cases where the language itself really does preclude some particular thing for some reason. Otherwise the relatively inexperienced programmers end up coming away with some really weird mental models of what programming languages can and can not do, just like here. Of course Rust qua Rust can store strings like this, it's basically built for things like this. Any mental model of Rust that thinks Rust qua Rust can't do this is a weird mental model of Rust.


Yep, especially with a systems language. As you say, they're basically built for "I need to do something specific."


When are small strings bad? Parallelism?


The "small string optimization" makes the strings harder to manipulate with `unsafe` code. However, being easy to manipulate with `unsafe` code is in fact a priority for most std types, so that they are easily understood, extended, rearranged, passed over FFI and then later reconstituted, etc.

You can tear apart these types and reassemble them very easily. For many C++ std types, you cannot do this.


Rust APIs use string slices (&str) extensively. With its current design, converting from a String to a &str is a no-op; if String instead did small string optimization, converting to &str wouldn't be free. Furthermore, thanks to the borrow checker, Rust code tends to avoid copying strings around, so the benefit of SSO is reduced. C++ does more string copying and didn't have a standard string_view for a long time, so considering the tradeoffs both languages made reasonable decisions.


Just wanted to say that I only got back to this thread now, but I agree with my sibling commentors.

Here's a discussion about it from a few years back, with some links to the primary discussions: https://news.ycombinator.com/item?id=18372332


It adds a branch every time you access the string (to check if it is small or not), and can stop various optimisations. g++ used to have small string optimisation, but (eventually) removed it.


You'll need to branch on most operations to check "am i small?". This may cause issues with the Branche predictor.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: