When I did interviews @ Google (I only do hiring committee work now, thank god :...

aunty_helen · on April 16, 2022

I find questions like this lack an on-ramp. Either you know or you don't. If you don't it gives you 0 indication of the skills.

I once flunked a faang interview because the interviewer mispronounced (or used the correct pronunciation I was unfamiliar with) of "arp protocol"

I had no idea of what was being asked and was racking my brain for something I didn't think I had ever used to down every computer in the library of my high school 12 years before.

jra_samba · on April 16, 2022

No, that's not true. If you know anything about C and writing secure code (which I what I was probing for) you know about using malloc(). You know because you have to know something about the internals of memory management.

Imagine you just read a 4 byte value off the network, and it's part of a protocol that specifies how many more bytes there are to read. You might (in error, ahem... :-) pass that value to malloc(). So knowing what might happen if an attacker puts an unexpected value in there is something you need to think about.

If you don't know or can't guess because you don't know how malloc() works, then you're not the person I'm looking for.

Silhouette · on April 16, 2022

That's one way to look at it. Another might be that if you write code that passes unsanitised input to anything and can get that code through testing and review then maybe you're not the kind of organisation that a candidate who knows about security wants to work for.

In the end this is still a language lawyer question. It's a technicality that should never be relevant. If it is you've already gone wrong several times. In other languages there might be an argument that it probably does something reasonable and any developer experienced with that language should be able to make an educated guess about what that would be even if they don't know. But you asked about C, a language infamous for having undefined behaviour in many such situations, so I don't think even that is a particularly compelling argument here.

jra_samba · on April 16, 2022

Many protocols read values of a network to specify how much is left in the packet (it's how packet boundaries are usually encoded, specifically in SMB1/2/3). So yes, no matter how paranoid you are you're eventually going to have to pass that value to something in your code :-).

Silhouette · on April 16, 2022

Many protocols read values of a network to specify how much is left in the packet (it's how packet boundaries are usually encoded, specifically in SMB1/2/3).

Sure. So do many other protocols and file formats. But if you're using those values for memory allocation without checking them first then getting a 0 might be the least of your worries. Unchecked large values might be a DoS attack waiting to happen.

If you work with C code where security is a factor then surely you already know this so it still seems odd to put so much emphasis on your original question. You do you I guess. :-)

jra_samba · on April 16, 2022

It's just a warmup. Tells me how the candidate thinks about such things. In production code of course the max size is limited to avoid DoS. My bug in Samba was missing the behavior of the zero case.

yongjik · on April 16, 2022

Unless you are specifically hiring for low-level network performance tuning (which is not how 99% of Googlers are hired), this still seems like a trivia question that's only marginally related to a person's C(++) competency. My impression is that Google discouraged asking such questions.

Source: worked at Google.

jra_samba · on April 16, 2022

I don't code to in C++ anymore. Did a long time ago. I'm looking for careful programmers around security and API design. This is nothing to do with performance tuning. It's to do with security.

ibeckermayer · on April 16, 2022

I'm not a C expert and I'm stumped but curious. What does malloc(0) return and why is that important?

jra_samba · on April 16, 2022

See the replies below. Someone just submitted a really comprehensive answer !

aunty_helen · on April 16, 2022

>If you don't know or can't guess because you don't know how malloc() works, then you're not the person I'm looking for.

Yea. It would be that way.

jra_samba · on April 16, 2022

Well yes. I'm looking for competent C coders. Competent C coders know how malloc works. It goes with the territory.

llimllib · on April 16, 2022

I'm not anything other than a C tourist, and I see that the man page says it returns either NULL or a "unique pointer value that can later be successfully passed to free()."

I'm kind of at a loss about why it can return either of those two things, somebody want to take a shot at explaining it?

lelanthran · on April 16, 2022

> I'm not anything other than a C tourist, and I see that the man page says it returns either NULL or a "unique pointer value that can later be successfully passed to free()."

>

> I'm kind of at a loss about why it can return either of those two things, somebody want to take a shot at explaining it?

Any return from malloc, whether it succeeds or not, is a valid argument to `free()`. Hence, it can return NULL because `free(NULL)` is legal, and anything other than NULL has to be a unique pointer, because if it returns a duplicate then calling `free()` no the duplicate will crash.

jra_samba · on April 16, 2022

Both NULL and a unique pointer value can be safely passed to free() :-).

Answers to this question taught me about the candidates taste and understanding of good API design :-).

Both NULL and "unique pointer" are correct answers, but all modern implementations only chose to return one of these. My follow-up question is "why ?" :-).

anamax · on April 16, 2022

When malloc returns NULL, it's saying that there was some error.

However, IIRC the only malloc error is ENOMEM. It's unclear why malloc(0) would run into that. (If malloc(0) did run into an ENOMEM, then NULL would be required, but the result of malloc(0) need not tell you anything about subsequent calls to *alloc functions. However, there is a possible malloc guarantee to consider.)

There's some interaction with realloc() which may favor NULL or a non-null pointer to 0 bytes, but that's too much work to figure out now.

Suppose that malloc(0) returns a not-NULL value. Is malloc(0) == malloc(0) guaranteed to be false? (I think that it is, which is how ENOMEM can happen.)

So, the "right" answer is probably malloc(0) returns a unique pointer because then the error check is simpler - a NULL return value is always a true error, you don't have to look at size.

jra_samba · on April 16, 2022

That's right ! Great answer.

malloc() is a very old API. A more modern version would probably looks like:

err_code malloc(size_t size, void **returned_ptr);

The current malloc() overloads the return to say NULL == no memory available/internal malloc fail for some reason and as far as the standard goes, allowing NULL return if size==0.

So if you get NULL back from malloc, did it really mean no memory/malloc fail, or zero size passed in ?

glibc and all implementations distinguish the two by allocating a internal malloc heap header, but with internal size bookkeeping size of zero, returning a pointer to the byte after the internal malloc heap header.

The only valid things you can do with the returned pointer is test it for != NULL, or pass it to realloc() or free(). You can never dereference it.

Returning a valid pointer to NO DATA is what all modern implementations do when a size==0 is requested.

In an interview situation, discussions around all these points are very productive, telling me how the candidate thinks about API design and how to fix a bad old one, whether they know anything about malloc internals (which is essential as overwriting internal malloc header info is a common security attack), and how they deal with errors returned from code.

Remember, it was only my warmup question :-). Things get really interesting after that :-) :-).

anamax · on April 16, 2022

Except that I left out a "should".

While it would be good if malloc(0) always returned a pointer, you can't rely on malloc(0) returning NULL just for errors. There's not even a guarantee that malloc(0) does the same thing every time.

Note that "returning a pointer to the byte after the internal malloc header" means that malloc(0) == malloc(0), breaking the unique pointer guarantee unless malloc(0) actually causes an allocation.

However, allocating in malloc(0) means that while(1) malloc(0); can cause a segfault, which would be a surprising thing.

jra_samba · on April 16, 2022

No, it doesn't.

malloc(0) != malloc(0) because the internal malloc header is different for each allocation.

Asking for a malloc of size n, means internally the malloc library allocates n + h, where h is the internal malloc header size. So there is always an allocation being done for at least a size of h, just with an internal bookkeeping "size" field set to zero.

and yes, while(1) malloc(0); will eventually run out of memory, counter-intuitively :-).

jra_samba · on April 16, 2022

Candidate answers to this also tell me if they understand anything about malloc implementations, which is a very useful skill to have as a C coder.

ja27 · on April 16, 2022

I did this. I had a folder with a few printed sheets of code hiding some slightly tricky C/C++ bugs from our codebase. I would ask the candidate to see if they could identify the issue. I don't think anyone ever actually got them but at least I got an idea of their thought process and experience. So mixed results.

teddyh · on April 16, 2022

According to GNU libc, malloc(0) can return either NULL or a valid unique pointer value which can later be passed to free(). You seem to imply that one or the other is definitely true.

jra_samba · on April 16, 2022

Yes. Write test code for glibc. You'll find it always returns one of the two. There is a reason for that.

teddyh · on April 16, 2022

Firstly, you seemed to imply that malloc(0) always, for all implementation, returns one or the other, and one of the two answers was wrong and the other was right.

Secondly, why should a C application developer know enough about the implementation details of malloc() to answer such an esoteric question? Malloc can not, by definition, be implemented in C, so it seems a bit out of scope.

jra_samba · on April 16, 2022

Of course malloc is implemented in C. Look at the glibc source code.

teddyh · on April 16, 2022

Malloc cannot, IIRC, be implemented in standard C only. It needs non-standard system calls. Specifically, how does malloc() itself allocate new memory?

anyfoo · on April 16, 2022

I mean, this is just being pedantic, but you could in theory declare a giant uninitialized non-local array

    uint8_t my_memory[4*1024*1024*1024];

and some additional globals for housekeeping[1], and use that as your backing memory for your malloc implementation written entirely in "standard C"[2]. And that usually does not even waste any memory, since my_memory will be in the zero-filled BSS section and the actual backing pages only allocated on demand by the OS once you write on them for the first time.

Of course in reality there's probably a few "non-standard"[2] system calls in the path, mostly mmap.

[1] Not necessarily, you can place them at special places in your my_memory array by convention, e.g. at the start of the array.

[2] I assume that we're excluding standards on other layers like POSIX, otherwise it's trivial.

jra_samba · on April 16, 2022

It uses the mmap, sbrk or brk system calls. All perfectly callable from C.

jra_samba · on April 16, 2022

mmap, brk, sbrk are standard POSIX system calls.

teddyh · on April 16, 2022

IIUC, brk and sbrk have been removed from POSIX, and using mmap() just to allocate memory is a pretty weird way to use mmap(), and probably not what you would want for a implementation of malloc().

jra_samba · on April 16, 2022

Not true. mmap is commonly used in malloc implementations. Look at this man page for jemalloc.

http://jemalloc.net/jemalloc.3.html

"Traditionally, allocators have used sbrk(2) to obtain memory, which is suboptimal for several reasons, including race conditions, increased fragmentation, and artificial limitations on maximum usable memory. If sbrk(2) is supported by the operating system, this allocator uses both mmap(2) and sbrk(2), in that order of preference; otherwise only mmap(2) is used."

Also, Google's tcmalloc uses mmap to get system memory:

https://github.com/google/tcmalloc/blob/master/docs/design.m...

"An allocation for k pages is satisfied by looking in the kth free list. If that free list is empty, we look in the next free list, and so forth. Eventually, we look in the last free list if necessary. If that fails, we fetch memory from the system mmap."

In fact I'd be surprised to see a modern malloc implementation that doesn't use mmap under the hood.

oconnor663 · on April 16, 2022

Huh. My first answer was "I don't know", and then I googled it and apparently it's implementation defined. So...my first answer was correct? :)

jra_samba · on April 16, 2022

True, it's implementation defined. But almost all implementations choose one of the two options. There is a reason for this.

FYI. I introduced a (bad) security bug into Samba a long time ago by expecting the wrong return (i.e. the one the implementations never return :-).