Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I did interviews @ Google (I only do hiring committee work now, thank god :-) I usually asked questions around bugs I added to an existing codebase, to see if the candidate can avoid the pitfalls I ran into by making bad assumptions.

As I'm a pure C coder, my starter question was usually something like:

a). What does malloc(0) return ? b). Why does it do that ?



I find questions like this lack an on-ramp. Either you know or you don't. If you don't it gives you 0 indication of the skills.

I once flunked a faang interview because the interviewer mispronounced (or used the correct pronunciation I was unfamiliar with) of "arp protocol"

I had no idea of what was being asked and was racking my brain for something I didn't think I had ever used to down every computer in the library of my high school 12 years before.


No, that's not true. If you know anything about C and writing secure code (which I what I was probing for) you know about using malloc(). You know because you have to know something about the internals of memory management.

Imagine you just read a 4 byte value off the network, and it's part of a protocol that specifies how many more bytes there are to read. You might (in error, ahem... :-) pass that value to malloc(). So knowing what might happen if an attacker puts an unexpected value in there is something you need to think about.

If you don't know or can't guess because you don't know how malloc() works, then you're not the person I'm looking for.


That's one way to look at it. Another might be that if you write code that passes unsanitised input to anything and can get that code through testing and review then maybe you're not the kind of organisation that a candidate who knows about security wants to work for.

In the end this is still a language lawyer question. It's a technicality that should never be relevant. If it is you've already gone wrong several times. In other languages there might be an argument that it probably does something reasonable and any developer experienced with that language should be able to make an educated guess about what that would be even if they don't know. But you asked about C, a language infamous for having undefined behaviour in many such situations, so I don't think even that is a particularly compelling argument here.


Many protocols read values of a network to specify how much is left in the packet (it's how packet boundaries are usually encoded, specifically in SMB1/2/3). So yes, no matter how paranoid you are you're eventually going to have to pass that value to something in your code :-).


Many protocols read values of a network to specify how much is left in the packet (it's how packet boundaries are usually encoded, specifically in SMB1/2/3).

Sure. So do many other protocols and file formats. But if you're using those values for memory allocation without checking them first then getting a 0 might be the least of your worries. Unchecked large values might be a DoS attack waiting to happen.

If you work with C code where security is a factor then surely you already know this so it still seems odd to put so much emphasis on your original question. You do you I guess. :-)


It's just a warmup. Tells me how the candidate thinks about such things. In production code of course the max size is limited to avoid DoS. My bug in Samba was missing the behavior of the zero case.


Unless you are specifically hiring for low-level network performance tuning (which is not how 99% of Googlers are hired), this still seems like a trivia question that's only marginally related to a person's C(++) competency. My impression is that Google discouraged asking such questions.

Source: worked at Google.


I don't code to in C++ anymore. Did a long time ago. I'm looking for careful programmers around security and API design. This is nothing to do with performance tuning. It's to do with security.


I'm not a C expert and I'm stumped but curious. What does malloc(0) return and why is that important?


See the replies below. Someone just submitted a really comprehensive answer !


>If you don't know or can't guess because you don't know how malloc() works, then you're not the person I'm looking for.

Yea. It would be that way.


Well yes. I'm looking for competent C coders. Competent C coders know how malloc works. It goes with the territory.


I'm not anything other than a C tourist, and I see that the man page says it returns either NULL or a "unique pointer value that can later be successfully passed to free()."

I'm kind of at a loss about why it can return either of those two things, somebody want to take a shot at explaining it?


> I'm not anything other than a C tourist, and I see that the man page says it returns either NULL or a "unique pointer value that can later be successfully passed to free()."

>

> I'm kind of at a loss about why it can return either of those two things, somebody want to take a shot at explaining it?

Any return from malloc, whether it succeeds or not, is a valid argument to `free()`. Hence, it can return NULL because `free(NULL)` is legal, and anything other than NULL has to be a unique pointer, because if it returns a duplicate then calling `free()` no the duplicate will crash.


Both NULL and a unique pointer value can be safely passed to free() :-).

Answers to this question taught me about the candidates taste and understanding of good API design :-).

Both NULL and "unique pointer" are correct answers, but all modern implementations only chose to return one of these. My follow-up question is "why ?" :-).


When malloc returns NULL, it's saying that there was some error.

However, IIRC the only malloc error is ENOMEM. It's unclear why malloc(0) would run into that. (If malloc(0) did run into an ENOMEM, then NULL would be required, but the result of malloc(0) need not tell you anything about subsequent calls to *alloc functions. However, there is a possible malloc guarantee to consider.)

There's some interaction with realloc() which may favor NULL or a non-null pointer to 0 bytes, but that's too much work to figure out now.

Suppose that malloc(0) returns a not-NULL value. Is malloc(0) == malloc(0) guaranteed to be false? (I think that it is, which is how ENOMEM can happen.)

So, the "right" answer is probably malloc(0) returns a unique pointer because then the error check is simpler - a NULL return value is always a true error, you don't have to look at size.


That's right ! Great answer.

malloc() is a very old API. A more modern version would probably looks like:

err_code malloc(size_t size, void **returned_ptr);

The current malloc() overloads the return to say NULL == no memory available/internal malloc fail for some reason and as far as the standard goes, allowing NULL return if size==0.

So if you get NULL back from malloc, did it really mean no memory/malloc fail, or zero size passed in ?

glibc and all implementations distinguish the two by allocating a internal malloc heap header, but with internal size bookkeeping size of zero, returning a pointer to the byte after the internal malloc heap header.

The only valid things you can do with the returned pointer is test it for != NULL, or pass it to realloc() or free(). You can never dereference it.

Returning a valid pointer to NO DATA is what all modern implementations do when a size==0 is requested.

In an interview situation, discussions around all these points are very productive, telling me how the candidate thinks about API design and how to fix a bad old one, whether they know anything about malloc internals (which is essential as overwriting internal malloc header info is a common security attack), and how they deal with errors returned from code.

Remember, it was only my warmup question :-). Things get really interesting after that :-) :-).


Except that I left out a "should".

While it would be good if malloc(0) always returned a pointer, you can't rely on malloc(0) returning NULL just for errors. There's not even a guarantee that malloc(0) does the same thing every time.

Note that "returning a pointer to the byte after the internal malloc header" means that malloc(0) == malloc(0), breaking the unique pointer guarantee unless malloc(0) actually causes an allocation.

However, allocating in malloc(0) means that while(1) malloc(0); can cause a segfault, which would be a surprising thing.


No, it doesn't.

malloc(0) != malloc(0) because the internal malloc header is different for each allocation.

Asking for a malloc of size n, means internally the malloc library allocates n + h, where h is the internal malloc header size. So there is always an allocation being done for at least a size of h, just with an internal bookkeeping "size" field set to zero.

and yes, while(1) malloc(0); will eventually run out of memory, counter-intuitively :-).


Candidate answers to this also tell me if they understand anything about malloc implementations, which is a very useful skill to have as a C coder.


I did this. I had a folder with a few printed sheets of code hiding some slightly tricky C/C++ bugs from our codebase. I would ask the candidate to see if they could identify the issue. I don't think anyone ever actually got them but at least I got an idea of their thought process and experience. So mixed results.


According to GNU libc, malloc(0) can return either NULL or a valid unique pointer value which can later be passed to free(). You seem to imply that one or the other is definitely true.


Yes. Write test code for glibc. You'll find it always returns one of the two. There is a reason for that.


Firstly, you seemed to imply that malloc(0) always, for all implementation, returns one or the other, and one of the two answers was wrong and the other was right.

Secondly, why should a C application developer know enough about the implementation details of malloc() to answer such an esoteric question? Malloc can not, by definition, be implemented in C, so it seems a bit out of scope.


Of course malloc is implemented in C. Look at the glibc source code.


Malloc cannot, IIRC, be implemented in standard C only. It needs non-standard system calls. Specifically, how does malloc() itself allocate new memory?


I mean, this is just being pedantic, but you could in theory declare a giant uninitialized non-local array

    uint8_t my_memory[4*1024*1024*1024];
and some additional globals for housekeeping[1], and use that as your backing memory for your malloc implementation written entirely in "standard C"[2]. And that usually does not even waste any memory, since my_memory will be in the zero-filled BSS section and the actual backing pages only allocated on demand by the OS once you write on them for the first time.

Of course in reality there's probably a few "non-standard"[2] system calls in the path, mostly mmap.

[1] Not necessarily, you can place them at special places in your my_memory array by convention, e.g. at the start of the array.

[2] I assume that we're excluding standards on other layers like POSIX, otherwise it's trivial.


It uses the mmap, sbrk or brk system calls. All perfectly callable from C.


mmap, brk, sbrk are standard POSIX system calls.


IIUC, brk and sbrk have been removed from POSIX, and using mmap() just to allocate memory is a pretty weird way to use mmap(), and probably not what you would want for a implementation of malloc().


Not true. mmap is commonly used in malloc implementations. Look at this man page for jemalloc.

http://jemalloc.net/jemalloc.3.html

"Traditionally, allocators have used sbrk(2) to obtain memory, which is suboptimal for several reasons, including race conditions, increased fragmentation, and artificial limitations on maximum usable memory. If sbrk(2) is supported by the operating system, this allocator uses both mmap(2) and sbrk(2), in that order of preference; otherwise only mmap(2) is used."

Also, Google's tcmalloc uses mmap to get system memory:

https://github.com/google/tcmalloc/blob/master/docs/design.m...

"An allocation for k pages is satisfied by looking in the kth free list. If that free list is empty, we look in the next free list, and so forth. Eventually, we look in the last free list if necessary. If that fails, we fetch memory from the system mmap."

In fact I'd be surprised to see a modern malloc implementation that doesn't use mmap under the hood.


Huh. My first answer was "I don't know", and then I googled it and apparently it's implementation defined. So...my first answer was correct? :)


True, it's implementation defined. But almost all implementations choose one of the two options. There is a reason for this.

FYI. I introduced a (bad) security bug into Samba a long time ago by expecting the wrong return (i.e. the one the implementations never return :-).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: