Fun libc comparison by the author of musl. My getaway is: glibc is bloated but f...

kstrauser · 2025-05-10T16:26:40 1746894400

It’s not shocking. More complex implementations using more sophisticated algorithms can be faster. That’s not always true, but it often is. For example, look at some of the string search algorithms used by things like ripgrep. They’re way more complex than just looping across the input and matching character by character, and they pay off.

Something like glibc has had decades to swap in complex, fast code for simple-looking functions.

weinzierl · 2025-05-10T16:53:34 1746896014

In case of glibc I think what you said is orthogonal to its bloat. Yes, it has complex implementations but since they are for a good reason I'd hardly call them bloat.

Independently from that glibc implements a lot of stuff that could be considered bloat:

- Extensive internationalization support

- Extensive backward compatibility

- Support for numerous architectures and platforms

- Comprehensive implementations of optional standards

kstrauser · 2025-05-10T17:09:22 1746896962

Ok, fair points, although internationalization seems like a reasonable thing to include at first glance.

Is there a fork of glibc that strips ancient or bizarre platforms?

SAI_Peregrinus · 2025-05-10T19:44:10 1746906250

It's called glibc. Essentially all that "bloat" is conditionally compiled, if your target isn't an ancient or bizarre platform it won't get included in the runtime.

kstrauser · 2025-05-10T19:51:07 1746906667

That’s mostly true, but not quite. For instance, suppose you aim to support all of 32/64-bit and little/big-endian. You’ll likely end up factoring straightforward math operations out into standalone functions. Granted, those will probably get inlined, but it may mean your structure is more abstracted than it would be otherwise. Just supporting the options has implications.

That’s not the strongest example. I just meant it to be illustrative of the idea.

jcranmer · 2025-05-10T22:23:14 1746915794

The way glibc's source works (for something like math functions) is that essentially every function is implemented in its own file, and various config knobs can provide extra directories to compile and provide function definitions. This can make actually finding the implementation that's going to be used difficult, since a naive search for the function name can turn up like 20 different function definitions, and working out which one is actually in play can be difficult (especially since it's more than just the architecture name).

Math functions aren't going to be strongly impacted by diverse hardware support. In practice, you largely care about 32-bit and 64-bit IEEE 754 types, which means your macros to decompose floating-point types to their constituent sign/exponent/significand fields are already going to be pretty portable even across different endianness (just bitcast to a uint32_t/uint64_t, and all of the shift logic will remain the same). And there's not much reason to vary the implementation except to take advantage of hardware instructions that implement the math functions directly... which are generally better handled by the compiler anyways.

saagarjha · 2025-05-11T04:17:57 1746937077

People don't typically implement math functions by pulling bits out of a reinterpreted floating point number. If you rely on the compiler, you get whatever it decides for you, which might be something dumb like float80.

int_19h · 2025-05-11T07:39:27 1746949167

"Internationalization" is a very broad item that can include e.g. support for non-UTF-8 locales, which is something few Linux distros need today.

dima55 · 2025-05-10T18:22:25 1746901345

What problem are you trying to solve? glibc works just fine for most use cases. If you have some niche requirements, you have alternative libraries you can use (listed in the article). Forking glibc in the way you describe is literally pointless

kstrauser · 2025-05-10T19:52:30 1746906750

Nothing really. I was just curious and this isn’t something I know much about, but would like to learn more of.

ape4 · 2025-05-11T14:09:33 1746972573

Yeah look at even strlen()

https://github.com/lattera/glibc/blob/master/string/strlen.c

GabrielTFS · 2025-05-13T19:03:08 1747162988

That's the generic implementation - it's not used on most popular architectures (I think the most popular architecture it's used on would be RISC-V or MIPS) because they all have architecture-specific implementations. The implementation running on the average (x86) computer is likely to be https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86... (if you have AVX512), https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86... (if you have AVX2 and not AVX512) or https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86... (if you have neither AVX2 nor AVX512 - rather rare these days)

LeFantome · 2025-05-10T18:41:33 1746902493

A lot of the “slowness” of MUSL is the default allocator. It can be swapped out.

For example, Chimera Linux uses MUSL with mimalloc and it is quite snappy.

jeffbee · 2025-05-10T23:19:46 1746919186

That's a great combo. I like LLVM libc in overlay mode with musl beneath and mimalloc. Performance is excellent.

userbinator · 2025-05-11T09:54:53 1746957293

Microbenchmarks tend to favour extreme unrolling and other "speed at any cost" tricks that often show up as negatives in macrobenchmarks.

flohofwoe · 2025-05-12T09:18:08 1747041488

Choice still matters IMHO. E.g. a very small but slow malloc/free may be preferable if your code only allocates infrequently. Also linking musl statically avoids the whole glibc dll version mess, admittedly only useful for cmdline tools though.

timeinput · 2025-05-10T16:37:09 1746895029

My take away is that it's not a meaningful chart? Just in the first row musl looks bloated at 426k compared to dietlibc at 120k. Why were those colors chosen? It's arbitrary and up to the author of the chart.

The author of musl made a chart, that focused on the things they cared about and benchmarked them, and found that for the things they prioritized they were better than other standard library implementations (at least from counting green rows)? neat.

I mean I'm glad they made the library, that it's useful, and that it's meeting the goals they set out to solve, but what would the same chart created by the other library authors look like?

cyberax · 2025-05-10T21:03:19 1746910999

Not quite correct. glibc is slow if you need to be able to fork quickly.

However, it does have super-optimized string/memory functions. There are highly optimized assembly language implementations of them that use SIMD for dozens of different CPUs.