Main is usually a function. So then when is it not? (2015)

gumby · on Aug 10, 2023

Well, if you really want to confound things you could write your own __start which is typically the entry point called by the kernel, which in turn calls `main()`.

But actually that's just by convention: C (and C++) programs drop `__start` into the binary and then tell the linker to mark the binary (mostly an ELF file these days, thank goodness) to indicate that the entry point is the symbol `__start`. There can be different versions depending on how many arguments to `main()` are requested, or it can simply pass all three and let the function body ignore the unwanted ones.

If you really want to confuse the TA, write the following program:

    #include <stdio.h>

    void foo();

    int main() {
      printf ("I'm in main\n");
      foo();
      printf ("I'm still in main\n");
    }

    void foo() {
      printf ("I'm in foo!\n");
    }

Then tell the linker to mark `foo` as your entry point. You could do this from the command line or even a custom linker script.

Of course depending on what your OS needs from `__start` you may have to do some extra setup, sorry. But this might work on Linux.

I've actually been thinking of modifying gcc's `__start` to pass the arguments and env vars as a span (well, the function __start calls that calls `main()`.)

noduerme · on Aug 10, 2023

This is fun. On a broader note though, entry point main-type functions have always bothered me. Maybe because I grew up as a simple script kiddie with BASIC and Bash and PHP. I like code that starts executing from the first line and then runs whatever functions it wants to run. I realize that's just an abstraction of main() but it's a pleasant one for me. There's something more constricting about there being one function to bootstrap everything than there is about one file.

simias · on Aug 10, 2023

> There's something more constricting about there being one function to bootstrap everything than there is about one file.

As someone who basically started coding with C I feel the other way around, unsurprisingly. Even in scripting languages when I write something non trivial I tend to encapsulate everything in functions and then have a `main()` call at the bottom of the file. I think it's one of the things python got right with the `if __name__ == "__main__":` idiom that lets you import any script in a REPL to test individual methods for instance.

(Although the actual syntax of `if __name__ == "__main__":` is IMO utter trash and it amuses me that a language so obsessed with getting rid of symbols and looking like pseudocode decided that such cumbersome, obtuse and ugly looking boilerplate was just fine.)

Uehreka · on Aug 10, 2023

> (Although the actual syntax of `if __name__ == "__main__":` is IMO utter trash and it amuses me that a language so obsessed with getting rid of symbols and looking like pseudocode decided that such cumbersome, obtuse and ugly looking boilerplate was just fine.)

This reminds me of when I tried learning Ruby. The tutorial I read started by talking about how elegant and beautiful Ruby could be, and how easy it is to read and understand Ruby at first glance, even if you’re new to it. Then they explained that the toString method in Ruby is called `.to_s()` and that lambda arguments were surrounded by pipe characters and I was like “wat”.

cpfohl · on Aug 10, 2023

> it amuses me that a language so obsessed with getting rid of symbols and looking like pseudocode decided that such cumbersome, obtuse and ugly looking boilerplate was just fine.

100 times this! I have always wondered about that.

adifgoi0nio · on Aug 10, 2023

Python is quick and dirty and always has been. The language designers have regularly papered over design flaws using hacks and magic syntax.

Which is acceptable. Python is a valuable tool I use on a daily basis. But I find it annoying when Python programmers are snooty about "Pythonic" code.

iaw · on Aug 10, 2023

The thesis I've heard (and agree with) is that Python is a good everything language. For specialization there is always a better tool (although maybe more complex to get started).

ekidd · on Aug 10, 2023

> There's something more constricting about there being one function to bootstrap everything than there is about one file.

As a compiler author, there are a bunch of nasty surprises to this approach. If you execute a file line-by-line, then functions only exist once you "reach" them.

If you write:

    def a(): b()
    a()
    def b(): ...

...then a() needs to crash when first called, because b() hasn't been declared yet. So your functions need to be invoked via some kind of table, and can't easily use jumps to hard-coded offsets. And b() can't be inlined.

There are dozens of these problems that come up when generating efficient code. And the easiest way to fix them all is to make your entire program "exist" from the beginning, so it can be compiled and optimized as a whole. Which is how you wind up with main().

noduerme · on Aug 10, 2023

Hey just on a completely off-topic thing, I like your style.

But I typed the name of your website into the URL bar, and even stupid Firefox just hung trying to load or trying to redirect from what it thinks should be the default https version. Had to specify http:// to load it. The browsers are making it almost impossible these days to just go to a plain website. half the time they loop themselves into a frenzy and cache the failed address with predictive type to the https version somewhere a normal user can't clear it, even when clearing history. It's a bloody mess for anyone who still needs to build http services.

tw37074462 · on Aug 10, 2023

> even stupid Firefox just hung trying to load or trying to redirect from what it thinks should be the default https version

Cannot reproduce.

The only problem(s) I'm seeing are related to operator error; nothing to do with browsers "loop[ing] themselves into a frenzy". The server at www.randomhacks.net works, but the server at randomhacks.net, on the other hand (and which is not the same as www.randomhacks.net), is for whatever reason failing to respond to the HTTP request. Reasonable guess: it's not a Web server.

At best, the service operator has the server (arguably) misconfigured, and the client operator is expecting the browser to do something it shouldn't while blaming failure on the browser after assuming it's doing something that it isn't actually doing.

ekidd · on Aug 10, 2023

> The server at www.randomhacks.net works, but the server at randomhacks.net, on the other hand (and which is not the same as www.randomhacks.net), is for whatever reason failing to respond to the HTTP request. Reasonable guess: it's not a Web server.

Yes, the entire setup is ancient. The www site is a static using CloudFlare, which (IIRC) needed a CNAME at the time, and CNAMEs did not work nicely with "bare" domains. The bare domain pointed to an actual server which did stuff, including serving a redirect on port 443. But that server is gone now.

I could fix this but it's way down the list after several major home projects. Besides, I come from an era where it was assumed that the www host was a dedicated machine. ;-)

noduerme · on Aug 10, 2023

You're right, my bad, the www subdomain just works, the plain domain hangs not as a result of a redirect. I thought I was looking at one of those 443>80 crack-ups once I got it to resolve and it was plain http, but I didn't notice I'd re-added the www.

krisoft · on Aug 10, 2023

> ...then a() needs to crash when first called, because b() hasn't been declared yet.

Why would it “need to” crash? If it is easier to implement to not crash and the developer intention is clear why would you define your new programing language such that it “needs to crash” in this situation?

It is as if you go to your garden to pick tomatoes, but you trip over a rake you intentionally put in your way and then as you lay on the ground hurting you conclude that it is impossible to pick tomatoes. Picking the tomatoes in this case is making a compiled programing language which has a main file instead of a main function. And the rake you trip over is the intentional decision to make the program crash in this situation. Just don’t put the rake there (choose to not crash in this situation) and then you can pick the tomatoes (have a compiled language with a main file instead if a main function).

There might be more complicated reasons why it is neccesary to have a main function, but this example does not really demonstrate it for me.

ekidd · on Aug 10, 2023

> Why would it “need to” crash? If it is easier to implement to not crash and the developer intention is clear why would you define your new programing language such that it “needs to crash” in this situation?

Well, let me come up with another example:

    def a(x=SIZE): ...
    a()
    let SIZE = 16*1024

When we call a(), we need the value of SIZE to provide the a default value for x. But SIZE isn't computed yet. We could try to "hoist" SIZE, but normally that just means we have:

    let SIZE = undefined
    def a(x=SIZE): ...
    a()
    SIZE = 16*1024

And sure, you could invent a rule to "fix" this case, too. (It depends on how you implement default arguments efficiently.) But next week, you'll encounter another headache, and another. I've literally been through this a couple of times working on LISP compilers. "Executional" semantics are common in custom Lisp dialects, and it's a huge amount of work to get them right.

The price you end up paying is lower program performance and higher compiler complexity. Oh, and importantly, you normally wind up with slower load times. A compiled program is "ready to run", and can be loaded efficiently using mmap() and maybe some linking. But a program where you "execute" the top level needs to run all those top-level definitions on each load. So then you're like, "I know! I'll write a heap dumper/undumper", aka "unexec". Which will fix this problem but cause 5 more.

And so it goes. This is one of those ideas that seems clever but leads to bitter regrets, at least in high performance languages.

krisoft · on Aug 10, 2023

Thank you for the example.

> And sure, you could invent a rule to "fix" this case, too.

I would just do the simple thing. If X is not already defined when it is used in the function signature I would just give a compile time error. Easy to implement for the compiler author, easy for the developer to understand and rectify.

> But next week, you'll encounter another headache, and another.

I believe you. I guess it is one of those things one has to see for themselves to really appreciate.

ufo · on Aug 10, 2023

One corner case is that b() might use variables that are initialized further down in the file. Some languages, such as Javascript, lift function definitions to the top so you can call functions defined below you, but it's harder to do the same for variables. In a compiled language it should be possible to detect this at compile time, but it's fiddly and more complicated than having a main function.

noduerme · on Aug 10, 2023

From Mozilla's docs:

>> JavaScript Hoisting refers to the process whereby the interpreter appears to move the declaration of functions, variables, classes, or imports to the top of their scope, prior to execution of the code.

>> Hoisting is not a term normatively defined in the ECMAScript specification. The spec does define a group of declarations as HoistableDeclaration, but this only includes function, function*, async function, and async function* declarations. Hoisting is often considered a feature of var declarations as well, although in a different way. In colloquial terms, any of the following behaviors may be regarded as hoisting:

>> Being able to use a variable's value in its scope before the line it is declared. ("Value hoisting")

>> Being able to reference a variable in its scope before the line it is declared, without throwing a ReferenceError, but the value is always undefined. ("Declaration hoisting")

>> The declaration of the variable causes behavior changes in its scope before the line in which it is declared.

>> The side effects of a declaration are produced before evaluating the rest of the code that contains it.

So basically all the shit we take for granted when not writing C/C++. What is curious to me is whether the main reasons for not attempting this in a C++ compiler in 2023, or else moving the language spec in this (unofficial) direction, is to produce maximal performance-optimization, or if it's mostly a cultural thing at this point. It does have the benefit of turning away hordes of javascript kiddies from the gates, but, the relative value of the performance edge to any optimization is getting weaker as compute becomes cheaper.

ufo · on Aug 10, 2023

The genie got somewhat out of the bottle with constructors: C++ will call constructors to initialize static variables. In theory we could use this mechanism to allow top-level statements. However, these static initializers are so full of footguns that perhaps it's best we don't :P

moron4hire · on Aug 10, 2023

Top-level execution doesn't require linear execution. As an example, JavaScript has top-level execution where function definition order doesn't matter. This is achieved by "hoisting" all function declarations within a block of execution to the beginning of the block before then exercising any statements.

noduerme · on Aug 10, 2023

Right. Adding virtual classes in weird orders does not make code more readable, and it's just a hint for the compiler anyway.

noduerme · on Aug 10, 2023

That certainly makes sense, as far as main() goes, but having main() in C++ for example doesn't solve the problem of needing to define functions prior to calls to them. It just helps you put all your stuff before main(), but even then it only "exists" if it shows up in the right order in every previous header file.

Line-based stuff like BASIC's GOTO/GOSUB was kind of a fun workaround to the idea of even having functions at all, and I'd happily live in that place still... but... as somoene who manifestly does not write compilers, is it still an extravagant demand in this day and age to ask a compiler to check and include/throw for all the functions out-of-order before running a code file from the top?

[edit] What I mean is, this was fundamental to ES3/4 bytecode compilers for VMs like Java or Flash, and it would be absurd to ask e.g. Javascript coders [edit: said Python, don't work with Python, there are cases in PHP where it's necessary] to order their functions in the order they're invoked with virtuals to place them. This is like providing toilets on a cruise ship. It's barely even a "service", like whereas garbage collection or something is an actual service. Restructuring the order of function definitions as you build out dependencies has got to be one of the worst wastes of coder time I can think of. Great if you really can find an optimization by doing that, but the way people code these days that basically never happens. We have to assume that everything we write gets compiled anyway, or why else is it in the code! If there are no dead branches, what's the purpose of not "existing"/let's say pre-virtualizing all the branches, regardless of which order they're written in? (Serious question).

Kim_Bruning · on Aug 10, 2023

Heh, I first ran into this issue with the built in assembler in BBC BASIC II.

You couldn't naively make a forward reference to label. This is because wasn't defined yet at the time the assembler encountered your instruction.

That's when you learned about multi-pass assembly.

wongarsu · on Aug 10, 2023

Javascript solves this with function hoisting. In simplified terms you do two compiler passes, once for global symbols and once for linear execution.

After all nothing stops you from writing in your language definition that defining a function anywhere in the file is the same as defining it in the beginning, and then you can treat it as if the entire file was in a main function, except for functions and the declaration of global variables.

em-bee · on Aug 10, 2023

how about before anything is executed, add a parse step that reads the whole file, looks for function definitions, and then goes back to the start to execute?

chriswarbo · on Aug 10, 2023

What counts as a function definition?

  def a(): b()
  a()
  import random
  with open("max", "r") as f:
    m = int(f.read())
    b = (lambda: "foo") if int(input("Pick a number: ")) % m > random.randint(0, m) else None

em-bee · on Aug 10, 2023

fair point, but this is why you need multiple parse steps. first parse you get the overall structure and note where forward references are made to things that are not known yet. in the second step you fill in those references as found.

it becomes problematic if a function definition is conditional. i suppose then that the compilation should fail because i guess conditional definitions would not be covered.

cryptonector · on Aug 10, 2023

Doesn't have to be that way. You can just do two passes.

fstokesman · on Aug 10, 2023

> There's something more constricting about there being one function to bootstrap everything than there is about one file.

The trickiest thing is that main() is not even the bootstrap function. The actual entry point of a program is usually generated by libc, and is called generally called _start (though it can be anything).

OJFord · on Aug 10, 2023

Surely it's less constricting? You can put that one function anywhere.

(I grew up on python, then C etc. at university, but as sibling says in Python I'd still `if __name__ == "__main__"`.)

mypetocean · on Aug 10, 2023

I've become a big fan of using something like a `main()` function to act as the one conventional place where pure functions are piped together at the top of a module.

You get this very nice linear execution birds' eye view which tends to be readable. Combine that with hoisting and you start with this big picture at the top of a file, and then can dig deeper into the smaller functions as needed, written in lexical order lower in the file.

Here is a very trivial JS example from a kata:

```js function main (numbers) { return Array .from(numbers) .sort(byGreatest) .slice(0, 2) .reduce(toSum) } ```

This is much more regularly written in languages with a pipeline operator, like Elixir, because you can pipe to arbitrary functions and operators (instead of being restricted to a method chain).

(JS/TS will get there eventually, if the TC39 committee can ever finally commit to the proposal.)

noduerme · on Aug 10, 2023

I like the method chain and find it very readable (and a good guarantor of typed output in the absence of strict types, if you use something like Typescript to lint it). But I don't see how the choice between that or a pipe operator has any impact on whether function main() should be your program's entry point. Unless your program is going to be an endless loop, e.g. a Nodejs server or else a game, I don't know why you'd want to saddle yourself with a main() every time it runs.

muxator · on Aug 10, 2023

Complete final program for the laziest of us (after incorporating @10000truths's advice):

  const int main[] __attribute__ ((section(".text"))) = {
      -443987883, 440, 113408, -1922629632,
      4149, 899584, 84869120, 15544,
      266023168, 1818576901, 1461743468, 1684828783,
      -1017312735
  };

Compilation (gcc 13):

  $ gcc -Wall main.c -o main
  main.c:1:11: warning: ‘main’ is usually a function [-Wmain]
      1 | const int main[] __attribute__ ((section(".text"))) = {
        |           ^~~~
  /tmp/ccsWmdiD.s: Assembler messages:
  /tmp/ccsWmdiD.s:4: Warning: ignoring changed section attributes for .text

Execution:

  $ ./main 
  Hello World!

CueXXIII · on Aug 10, 2023

Doesn't work, it prints nothing (on gcc (Debian 13.2.0-1) 13.2.0).

Although I found that gcc seems to be configured with -pie per default, so this compilation works:

  $ gcc -Wall -fno-PIE -no-pie main.c -o main
  mainf.c:1:11: warning: ‘main’ is usually a function [-Wmain]
      1 | const int main[] __attribute__((section(".text"))) = {
        |           ^~~~
  /tmp/ccq0adwj.s: Assembler messages:
  /tmp/ccq0adwj.s:4: Warning: ignoring changed section attributes for .text

10000truths · on Aug 10, 2023

This won't work anymore, as compilers will now place const arrays in the .rodata section, which is non-executable. Luckily, there's an easy fix - just qualify the declaration of the array with:

__attribute__((section(".text"))

Vespasian · on Aug 10, 2023

In a way that's unfortunate because the "google keyword" .text gives away what is happening here.

I bet there are some people who wouldn't even think about the fungible nature of data types (especially if they come from a "modern" language).

smokel · on Aug 10, 2023

Back when compiler warnings possibly cost extra processing time to generate, it was possible to make gcc compile the craziest things.

After much experimentation, it turned out that the smallest program that would compile and run was only 5 bytes long:

main;

extraduder_ire · on Aug 10, 2023

I still get a segmentation fault out of that when I run it. I think there's flags you can use to get the linker to not complain about missing main if you give gcc an empty file.

The craziest thing I got gcc (the AVR version, specifically) to compile for real purposes was a preprocessor macro that spit out tens of thousands of asm blocks with memory barriers and nop (do nothing) instructions with "PORTB = 1;" in the middle and "PORTB = 0" at the end. I needed it to bitbang out a clock signal to read an RFID tag on poorly documented hardware. (fun fact: the clock the cpu uses on an arduino uno is considerably worse and less accurate than the one that's on the board for the USB chip.)

SomeoneFromCA · on Aug 10, 2023

Must be the internal RC clock? it sucks.

extraduder_ire · on Aug 16, 2023

Nope, it's an external one at 16mhz. I think the internal one on the chip can only get up to 8mhz, and is accurate to about +-5%

It's enough to work for most things. Only annoying thing is, if you use the ICSP header to set the clock to external, you need a working clock to even use ICSP. (the 328pb falls back to the internal clock in this case)

leblancfg · on Aug 10, 2023

Welp that's a nerd snipe if I ever had one. Will report back once my bruteforce hack finishes crunching away.

smokel · on Aug 10, 2023

Please do, I'm eager to find out how long it takes to compile and run 4,294,967,296 small C files on modern hardware ;)

leblancfg · on Aug 11, 2023

I was able to feed stdin to `gcc` instead of all that IO, which considerably sped it up and spread over 5 cores.

But ultimately I gave up after the third time my computer crashed. I was at length 4 by that point, next step would have been to throttle the CPU usage.

LoganDark · on Aug 10, 2023

> make gcc compile the craziest things

In languages like Haskell you can just make 2+2=5: https://codegolf.stackexchange.com/a/28794

GrumpySloth · on Aug 10, 2023

That’s not that crazy. It just shadows the global + function. And thanks to lexical scope it has a rather limited impact.

LoganDark · on Aug 11, 2023

It's magic to people like me who don't know Haskell...

picadores · on Aug 10, 2023

We built a embedded "Operating System" during CS science courses, that was basically just a continous recursion, were the stack was eliminated and reset repeatetly. Main was just some assembly manipulating the instruction pointer to get the whole thing rolling. Good times, good crimes.

ackfoobar · on Aug 10, 2023

This reminds me of James Iry saying "all your code and data are in one giant mutable array indexed by pointers, good luck". Half true, but insightful.

Kab1r · on Aug 10, 2023

(2015)

eschneider · on Aug 10, 2023

Looks like he's independently re-invented PEEK and POKE from BASIC.

bee_rider · on Aug 10, 2023

5 points off for coding style.

5 points off for assuming too much about the system.

orra · on Aug 10, 2023

The article says `lea` helps calculate the array relative address on AMD64.

Why does the article say the problem would be tricky on 32-bit? `lea` is an old instruction. Thanks.

comex · on Aug 10, 2023

It’s not `lea` itself that’s new in AMD64 but rather RIP-relative addressing; this can be used as a memory operand for any instruction but it’s especially useful with `lea`. If you’re wondering where the reference to RIP is, the assembly in the post uses %eip instead of the usual %rip. [Edit: And doing so is wrong, contrary to what I wrote before editing; see sibling comment.]

astrange · on Aug 10, 2023

x86-32 doesn't have PC-relative addressing (`%eip`).

Btw, his code is wrong, it assumes pointers fit into 32 bits.

noam_k · on Aug 10, 2023

This reminds me of a great riddle: what is the shortest C code that compiles, but segfaults?

With the right flags: `main;` (implicitly a zeroed int).

eimrine · on Aug 10, 2023

main;

is a valid C program but main is not a function here.