Qt and C++ Trivial Relocation (Part 1)

aw1621107 · on May 8, 2024

Bit of a nitpick, but the short string optimization example seems a bit weird to me? I thought SSO normally consisted of a union of the buffer and the regular data/size/capacity pointers with some kind of in-band signaling, rather than the three pointers and the SSO buffer with the data pointer always pointing to the string data. The latter seems to somewhat defeat the purpose of SSO - the point is that for short strings you don't want to have to chase a pointer, so having the data pointer point to the inline buffer seems odd.

In fact, the std::is_trivially_relocatable paper explicitly lists std::string from libc++ and MSVC as being trivially relocatable [0], which seems to further weaken the example.

That's not to distract from the main point - that self-referential types aren't trivially relocatable - but I feel the example could be better.

[0]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p11...

quuxplusone · on May 8, 2024

Well, libstdc++'s std::string does SSO as depicted in the post. So you're right — 2 out of 3 library vendors manage to do SSO without losing trivial relocatability — but consider that 1 in 3 vendors does lose trivial relocatability thereby.

Now consider a type that is like std::string but doesn't do SSO — say, std::vector or std::unique_ptr or even std::shared_ptr. Those types are trivially relocatable on 3 out of 3 vendors, no problem. So I have no problem with the post's example.

Looking forward to part 2!

aw1621107 · on May 8, 2024

Ah, so it does (broadly speaking, at least), and I stand corrected. That was negligent of me to not check.

Does libstdc++'s implementation offer advantages that libc++/MSVC's lack? The implementation still looks weird to me, but I've got to be missing something if people far smarter than me decided on that representation.

quuxplusone · on May 8, 2024

As far as I know, libstdc++'s representation has two advantages:

First, it simplifies the implementation of `s.data()`, because you hold a pointer that invariably points to the first character of the data. The pointer-less version needs to do a branch there. Compare libstdc++ [1] to libc++ [2].

[1]: https://github.com/gcc-mirror/gcc/blob/065dddc/libstdc++-v3/...

[2]: https://github.com/llvm/llvm-project/blob/1a96179/libcxx/inc...

Basically libstdc++ is paying an extra 8 bytes of storage, and losing trivial relocatability, in exchange for one fewer branch every time you access the string's characters. I imagine that the performance impact of that extra branch is tiny, and massively confounded in practice by unrelated factors that are clearly on libc++'s side (e.g. libc++'s SSO buffer is 7 bytes bigger, despite libc++'s string object itself being smaller). But it's there.

The second advantage is that libstdc++ already did it that way, and to change it would be an ABI break; so now they're stuck with it. I mean, obviously that's not an "advantage" in the intuitive sense; but it's functionally equivalent to an advantage, in that it's a very strong technical answer to the question "Why doesn't libstdc++ just switch to doing it libc++'s way?"

saagarjha · on May 8, 2024

Typically, yes. However the implementation you’ve described is still an optimization sometimes because you can fit all the data together.

yazzku · on May 8, 2024

I believe std::vector traditionally was forced to copy or move-construct instead of realloc() because there was no concept of "trivially movable" or "trivially copyable". Has that changed in recent standards?

funkychicken · on May 8, 2024

It is slowly making its way through the standards committee. https://github.com/cplusplus/papers/issues/43

The author has a fork of clang and gcc with some pretty impressive speedups, so I’m hopeful! https://lists.isocpp.org/sg14/2024/04/1127.php

nly · on May 8, 2024

The C++ Allocator concept has no reallocate() function, which would also have to change in order to actually see a speedup:

https://en.cppreference.com/w/cpp/named_req/Allocator

But regardless, realloc() can't be used to implement the optimisation efficiently because of this line from the C man page:

> The realloc() function returns a pointer to the newly allocated memory, which is suitably aligned for any kind of variable *and may be different from ptr*

Basically, if the implementation can't extend the memory allocation it can fallback to malloc(), memcpy() and free() - so by the time C++ gets control back, it's too late to call the appropriate C++ functions.

MaxBarraclough · on May 8, 2024

The std::is_trivially_copyable type-trait can be used to determine if a type is trivially copyable. It's been in the standard since C++11. I imagine an implementation of std::vector could make use of it to determine if it's safe to use realloc.

https://en.cppreference.com/w/cpp/types/is_trivially_copyabl...

pixelesque · on May 8, 2024

Were all of these optimisations always in Qt 4 from it's introduction in 2005?

Given some of them seem to need std::move(), I assume they were added later once C++11 was common?

cjensen · on May 8, 2024

The optimization mentioned in this article is to literally use memcpy for certain types. As you mention, the code is from 2005, so the options are memcpy (when applicable) or a full copy constructor. The article discusses some old-school techniques that Qt uses to determine memcpy may be used.