Tracking down a segfault that suddenly started happening

jagrsw · on Jan 18, 2021

A possibly similar story:

I sometimes play video games on Steam under Linux. Recently I bought a new CPU (based on Zen3) and one of the games - Dirt Rally - started segfaulting.

After a quick fight with attaching strace to a running process under Steam (it was crashing v. quickly, so some racy script was needed), it turned out it crashes with

  SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_PKUERR, si_addr=0x7ff88440e7b8}

A quick session with kernel sources, revealed it's about Memory Protection Keys (this causes SEGV_PKUERR), which seemed like something that game was not using at all, because it's quite a new feature. After another hour or two, I found out what was the cause. The game was mmaping a PROT_EXEC memory segment without the PROT_READ flag, and was trying to read something from there with a 'mov' assembler instruction.

  mprotect(0x7ff884400000, 139264, PROT_EXEC) = 0

And under older i386/x86-64 CPUs it implicitly means PROT_EXEC|PROT_READ, because there is no way to make something executable, but readable. Under newer CPUs, the Linux kernel uses Memory Protection Keys to make the memory region actually executable-only.

After creating a quick'n'dirty LD_PRELOAD'able lib, the problem went away. Though, I wonder about Linus Torvald's mantra here, that the Linux kernel shouldn't break the userspace :)

  $  cat game.c
  #include <sys/mman.h>
  #include <unistd.h>
  #include <sys/syscall.h>
  
  int mprotect(void *addr, size_t len, int prot) {
   if (prot == PROT_EXEC) {
    prot |= PROT_READ;
   }
   return syscall(__NR_mprotect, addr, len, prot);
  }
  $ gcc game.c -shared -o game.so
  $ LD_PRELOAD=./game.so steam

robertnn · on Jan 18, 2021

I really enjoy stories like this. Thanks for sharing. Must have been really satisfying to get it working!

jagrsw · on Jan 18, 2021

Another one then :)

A few months ago I tried to run CSGO under Wayland. I recompiled libSDL.so b/c the one delivered with CSGO doesn't support Wayland, and ran with it LD_PRELOAD'ed. The game crashed upon start. After another debugging session with gdb/strace, I figured out that the CSGO binary is calling

  strstr()

with one of its arguments passed as a negative value from some other function, and it happens under Wayland only for some reason. Now, when preloading two libraries, and setting one environment flag I was able to play CSGO under Wayland.

  cat apps/strstr.c 
  #define _GNU_SOURCE
  #include <string.h>
  #include <dlfcn.h>
  #include <stdint.h>
  #include <inttypes.h>
  
  char *strstr(const char *haystack, const char *needle) {
   if ((uintptr_t)haystack > (uintptr_t)0xFFFFFFFF00000000) {
    return NULL;
   }
  
   char* (*p)(const char *haystack, const char *needle) = dlsym(RTLD_NEXT, "strstr");
   return p(haystack, needle);
  }
  
  $ SDL_VIDEODRIVER=wayland LD_PRELOAD=/home/<user>/apps/strstr.so:/home/<user>/Downloads/SDL-master/build/.libs/libSDL2-2.0.so.0.12.1 steam

But after playing with all those strace's/gdb's/LD_PRELOAD's my trust factor in CSGO (the score which says how likely I am to cheat in the near future), went down from Green (good player) to Red (Significantly Bad - will start cheating any moment:) within a week. And that's for 2012 account, with Prime enabled since 2016, and a couple of hundred matchmaking games played, and many more casual/FFA games. So YMMV :)

I wrote to CSGOTeamFeedback@valvesoftware.com asking if they could verify if my account really deserves this rating, because every second CSGO match is again blatant cheaters now, but since nothing changed since a week (when I wrote it), this probably means that LD_PRELOAD'ing your steam is not a good idea :).

finnthehuman · on Jan 18, 2021

>but since nothing changed since a week (when I wrote it), this probably means that LD_PRELOAD'ing your steam is not a good idea :).

Welcome to the "knows too much to be trustworthy" category of perceived-troublemaker.

feb · on Jan 18, 2021

And this story gives an extra argument in favor of dynamic libraries as they make it easier to fix some bugs in compiled applications (except for games which check if someone messed with LD_PRELOAD).

spuz · on Jan 18, 2021

> because there is no way to make something executable, but readable.

Did you mean to say "there is no way to make something executable, but not readable" ?

jagrsw · on Jan 18, 2021

valleyer · on Jan 18, 2021

The dynamic linker used by macOS (and derivatives) solves this problem using "two-level namespacing":

http://mirror.informatimago.com/next/developer.apple.com/rel...

thenoblesunfish · on Jan 18, 2021

One-armed man: “and that’s why you always namespace”

mindvirus · on Jan 18, 2021

One of the trickiest segfaults I tracked down:

We had this old and stable code that was writing and reading from some shared memory. One day out of the blue it started deadlocking and segfaulting, even with no recent changes to it.

It turned out to be an educational sequence of events:

- the library for daemonizing processes had a bug - it closed stdout but didn't open to /dev/null.

- our shared memory then got stdout's file number.

- independently, a shared library had some code that started writing to stdout... now our shared memory.

- so the logs clobbered the shared memory

Was hard to track down until I used xxd to look at the shared memory.

makecheck · on Jan 18, 2021

When two functions with the same name are ‘extern “C”’ for example, the compiler/linker will pick only one of them and which one is undefined. The choice could even change the next time you build. I’m not sure if compilers/linkers warn about this now but they certainly didn’t warn about it back when I first discovered this on a project.

And of course, the outcome ranges from nothing (if both copies happen to be identical), to slightly off (e.g. one dependency had a slightly older version of the function), to downright wrong (e.g. crash in this case).

ncmncm · on Jan 18, 2021

Maybe the lesson is that languages that don't explicitly support namespaces are pernicious.

Just compiling the original app and its static dependencies with a C++ compiler would fix this, because send_socket's linkage name would get the arguments mangled in.

After that, the code could use modern C++ features, and get incrementally more maintainable.

jeffrallen · on Jan 18, 2021

Tl;dr: dynamic linking results in calling code you didn't mean to.

One of many reasons that reliable systems use static linking.

s_gourichon · on Jan 18, 2021

Is it really needed to go as far as ditching dynamic linking and its benefits? Export only symbols intended to be called from outside, with a prefix, problem solved.

From TFA:

> This is a great opportunity to remind everyone: don’t use generic function names like this in your shared libraries, at least not in your exported symbols! You could easily run into a situation similar to this one. In my opinion, prefixes are definitely a good idea for your library’s exported symbols. In this case, both libusbmuxd and Samba were breaking that guideline.

> (...) dynamic libraries on Linux export all symbols by default unless you specify otherwise.

> libusbmuxd already fixed this on their end quite a while ago — they now only export functions intended to be public, which have a usbmuxd_ or libusbmuxd_ prefix

jcelerier · on Jan 18, 2021

> Export only symbols intended to be called from outside, with a prefix, problem solved.

any solution that depends on the goodwill of a third party is a 100% no-go

bhawks · on Jan 18, 2021

A common argument around dynamic linking is that it makes security updates easier.

This seems like a weak argument as an update could easily introduce new bugs with a security impact as well. Static linking gives a tested, known good executable. Running untested combos of libraries gives me some anxiety.

choeger · on Jan 18, 2021

It's not really about static vs. dynamic linking IMO. The point with ld is that the distribution has one central point for linking. If static linking with go, rust, C, etc. would work in a similar fashion, each executable could be shipped with a "re-link" script and the distribution could again provide security patches efficiently.

But when linking happens in some poorly maintained script, inside a custom build systen, written in two or more languages, running inside a docker container, executed by some year-old CI integration, patching security issues becomes impossible.

strictfp · on Jan 18, 2021

The argument is made on a system level, not on a single binary level.

Given many dynamically linked binaries, it's relatively easy to patch say OpenSSL on a system.

If all binaries are statically linked, you need to relink all binaries and push new versions of each one.

StreamBright · on Jan 18, 2021

I think it is very hard to quantify problem. In my experience once you have a security policy about updates it is equally easy to update statically and dynamically linked software. The real problem is that most companies do not care. On the other hand, if many application are running in Docker today does it matter if that app has static or dynamic linking?

xeromal · on Jan 18, 2021

The article says he used static linking. A dependency of a dependency didn't.

flqn · on Jan 18, 2021

I don't think static linking saves you here. This looks to be caused by an ODR violation (the "one definition rule"). The linker has two symbols with the same identifier and has to choose one.

mkj · on Jan 18, 2021

Static linking would error out with something like

  bar.c:(.text+0x0): multiple definition of `socket_send'

at compile time. (edit: I mean static linking everything including third party libraries)

DarkmSparks · on Jan 18, 2021

it was statically linked.

the issue was caused by another dynamically linked library referencing a different library at run time on a different linux distro than it was built on.

even says as much in the article.

vardump · on Jan 18, 2021

Perhaps this should trigger a warning. Or even an error with an option to provide (or generate with a known good configuration) an exclusion list for those.

Yeah, there are probably a lot of ways this would still cause issues. In any larger project with non-trivial dependencies, C-ABI generates a lot of extra work. At least without decorated (= mangled) function names...

josteink · on Jan 18, 2021

> Tl;dr: dynamic linking results in calling code you didn't mean to.

Way to show you didn't read the linked article. Because in it he even mentions he used static linking.

So that alone won't always save you.