Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Linux x86 program start up – How the heck do we get to main()? (2011) (dbp-consulting.com)
187 points by rwmj on Nov 4, 2021 | hide | past | favorite | 21 comments


One past thread:

Linux x86 Program Start Up - https://news.ycombinator.com/item?id=8739661 - Dec 2014 (30 comments)


https://news.ycombinator.com/from?site=dbp-consulting.com

There are more than that lol. Gets reposted once a year almost.


All the other submissions have no comments.


Great article, it was immensely useful in my quest to understand Linux processes. It says "Linux x86 program" but it assumes it's a hosted C program whose entry point is provided by GCC startfiles and libc. Everything after _start is specific to C. Note that _start is an arbitrary symbol and merely the linker's default value of the entry point. It can be changed to anything.

The important Linux facts are:

1. _start is not a function

It can't be returned from, the exit system call must be issued before execution terminates.

2. Arguments and environment are on the stack

Argument count and vector can be simply popped off the stack into appropriate registers, in that order.

The environment vector is located after the NULL terminator of the argument vector, or argument count + 1.

The auxiliary vector is located after the NULL terminator of the environment vector. No count is provided for that, so code must loop through the environment looking for the sentinel in order to find it.

The auxiliary vector is really interesting. I don't usually see software making direct use of it. It contains interesting information such as CPU identifier and capabilities, page size, the location of the Linux vDSO, some random bytes, program file name, user and group IDs, among other things.

https://github.com/torvalds/linux/blob/master/include/uapi/l...

https://github.com/torvalds/linux/blob/master/arch/x86/inclu...

https://github.com/torvalds/linux/blob/master/Documentation/...

This is the data the Linux kernel passes to programs. After organizing these parameters, the program is free to do whatever it wants. The libc entry point will naturally start setting up libc. In particular, it seems to spend a lot of time setting up the init and fini insanity that's probably better off forgotten.

https://blogs.oracle.com/solaris/post/init-and-fini-processi...

It's not necessary. After this, you can just run your program directly. I used to develop a liblinux that illustrates all this with much simpler code:

https://github.com/matheusmoreira/liblinux/blob/master/start...

https://github.com/matheusmoreira/liblinux/blob/master/start...

The entry point code passes the stack pointer to a C function which gathers all kernel parameters and starts the program with no further setup. I made several example programs, including one which outputs all these variables.

https://github.com/matheusmoreira/liblinux/blob/master/examp...

I stopped developing this because I discovered the Linux itself has a better solution that they use for their own tools:

https://github.com/torvalds/linux/blob/master/tools/include/...

The entry point code for all supported architectures is present as inline assembly code!


Hey, good to see you and liblinux again.

I'll just add that yes, things are actually way simple from ELF point of view. If you generate an ELF file by hand (or from a compiler you write), you can simply point it to the first instruction and the argv, argc, and environment pointers arrive as described above.

Virgil startup code is ~15 assembly instructions (even less for test binaries), and then it calls into the Virgil runtime source to get the heap setup and start allocating the first objects (array of strings for arguments).

I love low level posts like these. It's important we don't forget that C is just one alternative.


Good to see you and Virgil again!

> I love low level posts like these. It's important we don't forget that C is just one alternative.

Yes!! It's good to know where Linux ends and all the other stuff begins. Existing documentation makes things really confusing, it assumes people want the C stuff. Sometimes it even tells readers they aren't supposed to touch these "internals". It takes a lot of work to unravel this mess and get to the essential stuff.


> The environment vector is located after the NULL terminator of the environment vector, or argument count + 1.

For those who read that and don't know, it is the second occurrence of "environment" that should be "arguments".


Fixed it, thanks!


> It can't be returned from, the exit system call must be issued before execution terminates.

So what happens if exit isn't called?


1. If you add a "ret", you just jump to an invalid address.

2. If you add nothing, the CPU will continue to execute the bytes that follow.

In both cases it is quite certain you end with a segmentation fault.(Or in case 2., an illegal instruction)


You keep running whatever ‘instructions’ appear in the data after your last program instruction, so you could most likely have a segmentation fault, or worse run some random data as instructions and so crash, or worse still run some random data that happens to also be real instructions that does something harmful.


Segmentation violation.

I've seen code with a hlt instruction after main and the exit system call. Not sure what their intentions are, it should be unreachable.


I wonder how you get those 16 random bytes. I had no idea so much info was in AUx. uid. Program name etc. TIL.


The AT_RANDOM bytes are designed to provide randomness to the loader when the program is loaded. I looked at glibc and it's using this for a random stack check canary (which is needed very early during dynamic loading), and not actually used for anything else.

glibc nulls out its internal pointer (_dl_random) after use so you can't easily get the pointer later, but of course it'd be a bad idea to try and use it.


Accessing their website via HTTPS gives a warning with my browser:

> dbp-consulting.com uses an invalid security certificate.

> The certificate is not trusted because it is self-signed.


> part of the problem is that there is a prevalent unconscious gender bias in STEM that makes it unwelcoming for women

Just three paragraphs in.


Would refer to the guidelines [1] for this site:

> Be kind. Don't be snarky.

> Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

> Eschew flamebait. Avoid unrelated controversies and generic tangents.

[1] https://news.ycombinator.com/newsguidelines.html#comments


"Eschew flamebait. Avoid unrelated controversies and generic tangents."

Unfortunately, the articles HN links to don't have to follow those guidelines.


oh no! anything but trying to make STEM more welcoming to the women!! How can we possibly read any of this!?


Quite the suggestion that an anodyne joke about "real men" is whats stopping women from programming. In fact many women would no doubt find the suggestion patronizing.


If that's his personal moral code, I don't have a problem with him substituting "programmers" for "men", but just do it and spare me the lecture. I consciously avoid referring to groups of women or mixed-genders as "guys", but I don't stop and point out to everybody that I did it whenever I do.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: