Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but it's more subtle. The argument that there's no performance different between malloc/calloc is only valid if you have an MMU that will zero pages. Lots of embedded systems don't have such MMUs (or even an MMU at all); therefore, there is a difference between malloc and calloc.


But desktop processors also don't zero pages, do they?

And by zeroing pages, you prevent lazy memory allocation policies which do not allocate the pages until they're read or written for the first time. This can have a significant impact on memory usage, data locality etc.

So I'm quite skeptical that there is no difference between calloc and malloc. Is there more evidences of this somewhere?


How does it prevent such allocation? Can you not just mark the first read from an area to always return 0, and then allocate as soon as it happens? That also has the benefit of not having to zero anything if you're only writing to it.


Internally, an OS like Linux maps every page in a requested piece of memory to a read-only page of zeros. When you attempt to read from this page, it works fine. When you attempt to write to it, the MMU causes a page-fault and the OS inserts a page where the zero-page is, and then continues your program execution. Thus, it doesn't actually have to allocate any memory for you if you never write to the new pages.

But the OS/MMU doesn't distinguish between a regular write, and a write of zero. Thus, if you manually zero every page you get (And thus write zeros to the read-only zero-page), it'll page-fault back to the OS and the OS will have to allocate a new page so that the write succeeds - Even though if you didn't do the zeroing of memory you would have gotten the same effect of having a bunch of zeros in memory, but without having to allocate any new pages for your process.


Isn't that just saying that calloc is compatible with lazy allocation?


Kinda. Reading my comment a second time, I'm not exactly happy with my description, since while it's 'right' is a very simplistic description, ignoring some of the finer points.

Since malloc/calloc are generally used for smaller allocations, the chances you can actually avoid allocating some pages you ask for is pretty slim since a bunch of objects get placed into the same page (And thus writing to any of them will trigger a new page being allocated). There's also no guarantee there isn't headers for malloc to make use of, or similar surrounding your piece of memory, which makes the point moot - Just using malloc triggers writes to at least the first page. So while calloc/malloc are kinda compatible with lazy-allocation, you really shouldn't rely on it being a thing, and it probably won't matter.

It's worth understanding, but the chances it actually comes into play aren't huge. If your program does lots of small malloc's and free's, then it basically won't matter because you won't be asking the kernel for more memory, just reusing what you already have.

If you care about taking advantage of lazy-allocation for one reason or another, the bottom line is probably that you shouldn't be using malloc and calloc for that then. Just use mmap directly and you'll have a better time - more control, you have a nice page-aligned address to start with, and you can be sure the memory is untouched. malloc and calloc are good for general allocations, but using mmap takes out the guesswork when you have something big and specific you need to allocate.


Desktop OSs (not processors afaik) zero pages. To not do so would leak information between privilege contexts.

E.g. imagine /etc/passwd is read and later on the page it occupied is put back in the free pool. Another process comes along and asks for memory. It gets that page and can now read /etc/passwd .


Bit lost here, not really a C programmer, by "zeroing pages" are we talking about zeroing out memory? Because I wrote a little C program that just reads memory until it segfaults, usually I can get about 3gb of stuff out, random little bits of files. I've noticed it works much better on OS X than Linux.


...what? Uh, I don't know how memory allocation works in detail in OSX, so you could be right, but it'd surprise me. Can you show me your program?


here it is: https://gist.github.com/raincoats/7b2631106c74f759eb5d

A few things though:

- Last time I ran this, in the memory dump, every second character was the letter "f". Like, in the dump, instead of saying "Words" it would say "Wfofrfdfsf". It changes each time, probably to do with my char onechar variable. (Actually now that I think about it, it should probably only be onechar[1] instead of onechar[2]. It was called onechar because I was trying to printf 1 char at a time)

- Before I changed "p" to &memcpy, it was "p = &p", and it would only read my environ before segfaulting.

- This program, on linux, dies almost immediately. It only spits out a few bytes.


First of all even a simple program like that will have standard libraries mapped into it's address space. Your program is reading from a region of the processes address space that has the binary code for the memcpy() function.

On linux you can dump a processes address space mapping from file "/proc/<pid>/maps". Or to see an example just run: cat /proc/self/maps That cat command will dump the address space mapping for that instance of the cat command. If you run it multiple times it will show different address ranges because of a security feature (ASLR, address space layout randomization).

Also you shouldn't use fprintf or printf to print 1 char. printf will try to look for format characters in the "string" you are passing it. It would be better to use fputc(*p, stdout), then you don't even need to call memcpy() or the "onechar" array.

If you set "p" to a block of allocated memory, then your program should segfault shortly after reading past the end of the memory. How many bytes you can read past the end of the allocated block of memory depends on how the memory allocator manages pools of memory and it's own bookkeeping information is stored.


The equivalent on OSX to catting /proc/<pid>/maps is running vmmap -w <pid>


Your program has several bugs, but you already know about that :).

I can't get it to print 3g of stuff on OSX. It prints about 300m , which makes a lot more sense.

http://pastebin.com/uBHEW7X8


your program doesn't allocate memory, so it is unrelated to memory allocation. You instead are taking the address of a function in memory (memcpy); and then printing out the function. This isn't a leak in any way since your program has to be able to reach memcpy and call it.

So, no, you aren't printing out bits of files. You're printing out functions that your program has the ability to call.


Well, they are files. Runtime libraries.

He's starting somewhere in /usr/lib/system/libsystem_platform.dylib (where memcpy lives) and printing the rest of the executable code his program has mapped in.

It doesn't surprise me that some of this contains public keys and certificates. They are probably used by some OSX networking library.


Yeah I'm thinking you're right, if I set the environment variable DYLD_SHARED_REGION=avoid, it only prints out about 300kb ofstuff.


3gb of them? Really? That would be the largest runtime library I have ever heard of, forget any other libc in existence.


I'm getting ~300m out. I suspect the author made a power-of-10 mistake in reading the output of ls.


I can grep for -----BEGIN and find a few public keys and certificates, I doubt they're part of the program (but they might be, as I said I don't really know) http://i.imgur.com/IXgkWNu.png


Even on a modern non-embedded system, you aren't going to benefit from the MMU when calling calloc(). Your pages are only going to be zero'd when you first get them from mmap() or brk()/sbrk().

If you're sitting in a loop calling malloc(), filling it with data, processing it, and calling free(), changing malloc() to calloc() is just wasting cycles.


In fact, brk and mmap aren't even really guaranteed to return zero filled memory. It just happens that kernels often zero everything to avoid the headache of tracking which pages contain data you aren't supposed to see.

I guess one can rely on such behavior when writing a platform-specific libc. But in portable code? Yuck.


Kernels have to provide cleared memory if they don't want to leak data between processes.


1. There is nothing inherently wrong in "leaking" a page which used to cache some word-readable file or belonged to a process you'd be able to ptrace anyway.

2. One could imagine this being configurable for embedded or HPC systems which don't care about security, though I'm not aware of people actually doing that.

3. The memory could equally well be overwritten with 0xdeadbeef and you have no control over that.

4. I once heard that Windows may sometimes fault-in nonzero pages if it runs out of pre-zeroed ones, but I couldn't locate this in MS docs so I'm not quite sure.


MMUs don't zero pages. MMUs don't handle data at all. MMUs handle addresses only, translating from the virtual (software) view to the hardware (RAM) view.


I used to use C a lot, but haven't in a while, so question: isn't there a sub-allocator involved also that could affect the condition of the memory allocated without an explicit zeroing ? Is the memory always allocated directly from the OS ? That seems like it could be slow.

Thanks.


Of course there is, the parent is clueless and spreads misinformation.

Heap allocation can easily contain data that have been previously freed in the same process and hence calloc has to perform some extra work to wipe them.


> hence calloc has to perform some extra work to wipe them.

IIRC FreeBSD will zero freed pages in the background when it has idle time, and calloc can draw from that pool of zeroed pages "for free".


Ehh, that doesn't apply to what he's talking about though. That only works when the pages are actually freed. Since actually asking the kernel for more memory is expensive, most malloc/calloc implementations will hold-on to some pages since it assumes you're going to allocate more. Since it reuses this memory without ever giving it back to the kernel, the kernel won't zero it for us.


> Ehh, that doesn't apply to what he's talking about though.

Of course it does.

> That only works when the pages are actually freed. Since actually asking the kernel for more memory is expensive, most malloc/calloc implementations will hold-on to some pages since it assumes you're going to allocate more. Since it reuses this memory without ever giving it back to the kernel, the kernel won't zero it for us.

It's what the standard system malloc does, possibly in cooperation with the kernel but not necessarily so (why would it need the kernel involved after all?)

Just as OpenBSD's malloc framework can do way more than trivially request memory from the kernel: http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man5/.... In fact, since OpenBSD 5.6 it'll junk small allocations on freeing (fill them with 0xdf) by default. And as usual that uncovered broken code…

> When we started junking memory by default after free, the postgres ruby gem stopped working.


Would you mind providing some more information on this? I googled it but I couldn't find any documentation on what you describing.

To be clear though, the parent commenter (and I) are talking about the pages that malloc keeps around in the processes memory to be reused (And thus are never released back to the OS). Are you saying FreeBSD/OpenBSD/others have a system to tell the kernel when a user process has pages it plans to zero, and then a system for the kernel to notify the process when/if it does? That would be pretty interesting to see, but I've never heard of that being a thing.


> Would you mind providing some more information on this? I googled it but I couldn't find any documentation on what you describing.

I was talking about https://www.freebsd.org/doc/en/articles/vm-design/prefault-o... but after actually re-checking the docs rather than going from bitrotted memory pre-zeroing only happens for the initial zero-fill fault, so the original calloc is "free" (if there are zeroed pages in the queue) but freeing then re-calloc'ing will need to zero the chunk (unless the allocator decides to go see if the kernel has pre-zeroed space instead I guess). My bad.

> To be clear though, the parent commenter (and I) are talking about the pages that malloc keeps around in the processes memory to be reused (And thus are never released back to the OS).

Yeah so was I, but I was misremembering stuff.

> Are you saying FreeBSD/OpenBSD/others have a system to tell the kernel when a user process has pages it plans to zero, and then a system for the kernel to notify the process when/if it does?

Turns out no.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: