I see this as the perfect moment to get into consulting - either development, or security. People were not sure what jobs AI will create: "GenAI babysitting" is one of them.
"Make one Ubuntu package 90% faster by rebuilding it and switching the memory allocator"
i wish i could slap people in the face over standard tcp/ip for clickbait. it was ONE package and some gains were not realized by recompilation.
i have to give it to him, i have preloaded jemalloc to one program to swap malloc implementation and results have been very pleasant. not in terms of performance (did not measure) but in stabilizing said application's memory usage. it actually fixed a problem that appeared to be a memory leak, but probably wasn't fault of the app itself (likely memory fragmentation with standard malloc)
I did research into the glibc memory allocator. Turns out this is not memory fragmentation, but per-thread caches that are never freed back to the kernel! A free() call does not actually free the memory externally unless in exceptional circumstances. The more threads and CPU cores you have, the worse this problem becomes.
One easy solution is setting the "magic" environment variable MALLOC_ARENA_MAX=2, which limits the number of caches.
Another solution is having the application call malloc_trim() regularly, which purges the caches. But this requires application source changes.
FWIW i had it with icinga2. so now they actually preload jemalloc in the service file to mitigate the issue, this may very well be what you're talking about
True, I also believed it for a second. But it's also easy to blame Ubuntu for errors. IMHO they are doing a quite decent job with assembling their packages. In fact they are also compiled with Stack fortifications. On the other hand I'm glad they are not compiled with the possibly buggy -O3. It can be nice for something performance critical but I definitely don't want a whole system compiled with -O3.
To me it's obviously a scam because there's no way such an improvement can be achieved globally with a single post explanation. 90% faster is a micro-benchmark number.
This is neither a micro-benchmark nor a scam, but it is click-bait by not mentioning jq specifically.
Micro-benchmarks would be testing e.g. a single library function or syscall rather than the whole application. This is the whole application, just not one you might care that much for the performance of.
Other applications will of course see different results, but stuff like enabling LTO, tuning THP and picking a suitable allocator are good, universal recommendations.
True that, I mean it is still interesting, that if you have a narrow task, you might achieve some significant speed up from rebuilding them. But this is a very niche application.
true, i saw a thread recently on reddit where guy hand-tuned compilation flags and did pgo profiling for a video encoder app that he uses on video encode farm.
In his case, even a gain of ~20% was significant. It calculated into extra bandwidth to encode a few thousand more video files per year.
I wonder how many prepackaged binary distributions are built with the safest options for the os/hardware and don't achieve the best possible performance.
I bet most of them, tbh.
Many years ago I started building Mozilla and my own linux kernels to my preferences, usually realizing modest performance gains.
The entire purpose of the Gentoo Linux distribution, e.g., is performance gains possible by optimized compilation of everything from source.
the title is clickbait, but it's good to encourage app developers to rebuild. esp when you are cpu bound on a few common utitilities e.g. jq, grep, ffmpeg, ocrmypdf -- common unix utils built build targets for general use rather than a specific application
Or, if I understand TFA correctly, don't release debug builds in your release packages.
Reminds me of back in the day, when I was messing around with blender's cmake config files quite a bit, I noticed the fedora package was using the wrong flag -- some sort of debug only flag intended for developers instead of whatever they thought is was. I mentioned this to the package maintainer, it was confirmed by package sub-maintainer (or whomever) and the maintainer absolutely refused to change it because the spelling of the two flags was close enough they could just say "go away, contributing blender dev, you have no idea what you're talking about." Wouldn't doubt the fedora package still has the same mistaken flag to this day and all this occurred something like 15 years ago.
So, yeah, don't release debug builds if you're a distro package maintainer.
Vector operations like AVX512 will not magically make common software faster. The number of applications that deal with regular operations on large blocks of data is pretty much limited to graphical applications, neural networks and bulk cryptographic operations. Even audio processing doesn't benefit that much from vector operations because a codec's variable-size packets do not allow for efficient vectorization (the main exception being multi-channel effects processing as used in DAW).
Thanks for the correction. I hadn't considered bulk memory operations to be part of SIMD operation but it makes sense -- they operate on a larger grain than word-size so they can do the same operation with less micro-ops overhead.
in case you don't know, some Gameboy games required to have Nintendo logo in the game data as part of copy protection. allegedly that was legal protection against bootlegs.
for some extra nostalgia, check out "one finger death punch 2" game (and its prequel). i bet it's sort of an homage to those animations.
reply