Great article, it was immensely useful in my quest to understand Linux processes. It says "Linux x86 program" but it assumes it's a hosted C program whose entry point is provided by GCC startfiles and libc. Everything after _start is specific to C. Note that _start is an arbitrary symbol and merely the linker's default value of the entry point. It can be changed to anything.
The important Linux facts are:
1. _start is not a function
It can't be returned from, the exit system call must be issued before execution terminates.
2. Arguments and environment are on the stack
Argument count and vector can be simply popped off the stack into appropriate registers, in that order.
The environment vector is located after the NULL terminator of the argument vector, or argument count + 1.
The auxiliary vector is located after the NULL terminator of the environment vector. No count is provided for that, so code must loop through the environment looking for the sentinel in order to find it.
The auxiliary vector is really interesting. I don't usually see software making direct use of it. It contains interesting information such as CPU identifier and capabilities, page size, the location of the Linux vDSO, some random bytes, program file name, user and group IDs, among other things.
This is the data the Linux kernel passes to programs. After organizing these parameters, the program is free to do whatever it wants. The libc entry point will naturally start setting up libc. In particular, it seems to spend a lot of time setting up the init and fini insanity that's probably better off forgotten.
The entry point code passes the stack pointer to a C function which gathers all kernel parameters and starts the program with no further setup. I made several example programs, including one which outputs all these variables.
I'll just add that yes, things are actually way simple from ELF point of view. If you generate an ELF file by hand (or from a compiler you write), you can simply point it to the first instruction and the argv, argc, and environment pointers arrive as described above.
Virgil startup code is ~15 assembly instructions (even less for test binaries), and then it calls into the Virgil runtime source to get the heap setup and start allocating the first objects (array of strings for arguments).
I love low level posts like these. It's important we don't forget that C is just one alternative.
> I love low level posts like these. It's important we don't forget that C is just one alternative.
Yes!! It's good to know where Linux ends and all the other stuff begins. Existing documentation makes things really confusing, it assumes people want the C stuff. Sometimes it even tells readers they aren't supposed to touch these "internals". It takes a lot of work to unravel this mess and get to the essential stuff.
You keep running whatever ‘instructions’ appear in the data after your last program instruction, so you could most likely have a segmentation fault, or worse run some random data as instructions and so crash, or worse still run some random data that happens to also be real instructions that does something harmful.
The AT_RANDOM bytes are designed to provide randomness to the loader when the program is loaded. I looked at glibc and it's using this for a random stack check canary (which is needed very early during dynamic loading), and not actually used for anything else.
glibc nulls out its internal pointer (_dl_random) after use so you can't easily get the pointer later, but of course it'd be a bad idea to try and use it.
Quite the suggestion that an anodyne joke about "real men" is whats stopping women from programming. In fact many women would no doubt find the suggestion patronizing.
If that's his personal moral code, I don't have a problem with him substituting "programmers" for "men", but just do it and spare me the lecture. I consciously avoid referring to groups of women or mixed-genders as "guys", but I don't stop and point out to everybody that I did it whenever I do.
Linux x86 Program Start Up - https://news.ycombinator.com/item?id=8739661 - Dec 2014 (30 comments)