Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some of this seems like horrendously bad advice:

- Using fixed width integers in for loops seems like a fabulous way to reduce the portability of code

- the statements in "C allows static initialization of stack-allocated arrays" are _not_ equivalent, one is a bitwise zeroing while the other is arithmetic initialization. On some machines changing these statements blindly will cause a different bit pattern to end up in memory (because there are no requirements on the machine's representations for e.g. integers or NULL or ..). There are sound reasons why the bitwise approach could be preferred, for example, because a project has debug wrappers for memset that clearly demarcate uninitialized data

- the statements in "C99 allows variable length array initializsers" aren't even slightly equivalent. His suggestion uses automatic storage (and subsequently a stack overflow triggered by user input -- aka. a security bug)

- "There is no performance penalty for getting zero'd memory" this is bullshit. calloc() might optimize for the case where it is allocating from a page the OS has just supplied but I doubt any implementation ever bothered to do this, since it relies on the host OS to always zero new pages

- "If a function accepts arbitrary input data and a length to process, don't restrict the type of the parameter." the former version is in every way more self-documenting and consistent with the apparent function of the procedure than his use of void. It also runs counter to a rule from slightly more conservative times: avoid void at all costs, since it automatically silences all casting warnings.

Modern C provides a bunch of new things that make typing safer but none of those techniques are mentioned here. For example word-sized structs combined with struct literals can eliminate whole classes of historical bugs.

On the fixed width integers thing, the size of 'int', 'long' and 'long long' are designed to vary according to the machine in use. On some fancy Intel box perhaps there is no cost to using a 64bit type all the time, but on a microcontroller you've just caused the compiler to inject a software arithmetic implementation into your binary (and your code is running 100x slower too). He doesn't even mention types like intfast_t designed for this case, despite explicitly indicating "don't write C if you have to", which in 2016 pretty commonly means you're targeting such a device



Advice presented along the lines of "there are no reasons to do anything but that which I recommend" is usually somewhat flawed... just because there are so many reasons to do different things in different situations.

Regarding calloc (which indeed might be slower than malloc contrary to TFA) - a variable that should have been initialized to something but instead keeps zero bits written by calloc is not necessarily a great situation. If you use malloc, at least Valgrind will show you where you use the uninitialized variable. With calloc it won't, though it might still be a bug - a consistently behaving bug, but a bug nonetheless. Someone preferring consistently behaving bugs to inconsistently behaving bugs in production code might use a function my_malloc (not my_calloc...) which in a release build calls calloc, but in debug builds it calls malloc (and perhaps detects whether it runs under Valgrind and if not, initializes the buffer not with zeros, but with deterministic pseudo-random garbage.)


I follow a similar approach. Valgrind is a must. Just using malloc()+memset(,0,) in release, to override someway the Linux OOM killer.

In starvation, malloc+memset "usually" results in a malloc() failure, instead of the process being silently killed by the OOM killer.


another thing is that both calloc and malloc return non null pointer for zero arguments (that can be very bad). my_alloc should check for zero size arguments and return NULL in this case. (The problem is that if your size computation overflowed then you will get a very small non null memory block - which you will be very likely to overwrite)

Calloc has another benefit: if you calculate size by n * sizeof(kuku) then it will detect integer overflow (this can be very important for security sensitive stuff that deals with untrusted input), if you did the computation before that and passed calloc( size, 1) then this multiplication and check is another few cycles wasted.

I wonder if they could have done a standard library allocation function like calloc that does the multiplication with overflow detection, does not zero out memory and that returns NULL on zero arguments - but that would be too much to ask.


actually openbsd has something like this: mallocarray [1] and reallocarray [2] (OMG realloc) actually here [3] they say that use of these function in libressl and the openbsd port of the http server, etc.

[1] http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man9/...

[2] http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/...

[3] http://www.openbsd.org/papers/httpd-asiabsdcon2015.pdf


>> - Using fixed width integers in for loops seems like a fabulous way to reduce the portability of code

Can you explain this? I'm not seeing it.

You should generally know the range of numbers you're seeking to store in an integer variable, if you don't then you're asking for trouble.


Yeah in fact using non-fixed width integers reduces portability!

I ran into a bug where I was running some Arduino code (16-bit) on an mBed (32-bit). The use of `int` rather than `uint16_t` caused a timeout to become an infinite loop.

I'm pretty sure it would be impossible to have portability issues due to using fixed-width types - I mean their entire point is to ensure portability.


Do you have any reason for supporting uint16_t instead of uint_least16_t?

Their point is against portability as they may not exist at all.


For one, if you write uint_least16_t, you're implicitly warranting that the code does the right thing if it isn't 16 bits, which means you have to think about it. And C's integer conversion rules and overflow restrictions are typically quite a lot to think about already... Not the strongest argument, but I think there is a case for applying YAGNI.

Also, it's less typing. :)


> Their point is against portability as they may not exist at all.

The good part being that you're warned at compile time, not when your 2^16 elements loop becomes a 2^32 elements loop.


Both problems are being avoided if you use [u]int_leastN_t

Or if you do not make assumptions of the sizes of int, something that only bad programmers do.


> Both problems are being avoided if you use [u]int_leastN_t

No you'll have the exact problem I cited in that case.

> Or if you do not make assumptions of the sizes of int, something that only bad programmers do.

Requesting a specific data size is the exact opposite of making assumptions.


Well, the assumption you have typically been making, when your unsigned loop counter is causing problems because it is larger than expected, is that the counter will exhibit a particular behavior when it over- or underflows. IIRC, this is unspecified in the standard and you can't write portable code making such assumptions.

I can see how the type changing to a smaller one might cause problems, but I don't see how IshKebab's example could happen without exploiting implementation specific overflow behavior.


> I can see how the type changing to a smaller one might cause problems, but I don't see how IshKebab's example could happen without exploiting implementation specific overflow behavior.

INT_MAX (or INT_LEASTN_MAX for annathebannana's suggestion) doesn't require exploiting overflow behaviour, but going from 2^15 iterations to 2^31 or 2^61 iterations may be problematic.


Problematic, yes, but not the difference between a terminating loop and an endless loop, which is what he said was the result of the size change. If that actually is the case I'd guess the problem was that the compiler for the latter platform was removing a zero comparison loop condition based on the assumption that a value that only ever increments can only ever be zero once, using the unspecified nature of ocerflows to its advantage, while the former did not.

And using INT_MAX or similar for what sounds like a timing loop is a whole other can of bad practice. Then the problem isn't that you used the wrong type, it's that you used the wrong value.


> Problematic, yes, but not the difference between a terminating loop and an endless loop

If the loop does significant work and was calibrated for an expectation of 65k iterations, stepping to 2 billion (let alone a few quintillion) is for all intents and purpose endless.

> And using INT_MAX or similar for what sounds like a timing loop is a whole other can of bad practice. Then the problem isn't that you used the wrong type, it's that you used the wrong value.

No objection there, doing that is making invalid assumptions, my point is that moving to exact-size integral does fix it.


> If the loop does significant work and was calibrated for an expectation of 65k iterations, stepping to 2 billion (let alone a few quintillion) is for all intents and purpose endless.

And if it's not, it's not, the point being that endless in this case would be meaningless outside it's literal meaning unless we know more about the specific case.

> No objection there, doing that is making invalid assumptions, my point is that moving to exact-size integral does fix it.

No, using an exact value fixes it. Any unsigned integer type is just fine for any integer value from 0 to 65535. If you change the type to a larger integer type without changing the supposed iteration count, the code would not have this problem, and if you changed the value to something higher than 65535 without adjusting the size of the type, you would have a different problem. Thus, the problem described here does not pertain to the type of the variable used.


The only issue I can think of is that arithmetic on an uint16_t will be signed on a 32-bit target and unsigned on a 16 bit target. I think the only possible way to get undefined behavior would be to left shift by enough to overflow the signed integer on the 32-bit target, which should be rare (and might not be well defined on the 16 bit target; I don't have the spec in front of me right now).


You "know the range", but not necessarily in terms of concrete numbers. Often, you iterate over variable size data structures, for example. All you know is that it won't be larger than the C implementation it's being compiled with allows--which usually roughly means that it won't be larger than the amount of memory the processor it's running on can address. But how much that actually is? Well, depends on the processor, and there also is no upper limit, as you always can invent a processor that can address more memory than any processor before. That's why you should usually use appropriate abstract types (such as size_t in this case) that the compiler will make sure are large enough to be able to iterate over any data structure you will encounter on that target.


Oh sure, for array index stuff like that. For general iteration against another number, say, I would use a fixed (or fast) size appropriate to my data.


Yes, and the range of array indexes should be 'from 0 to the maximum index of the array'. This is system dependent.


[u]intN_t may not exist. If you are working with arrays, always use size_t if not, use [u]int_leastN_t

in 99% of the cases, except if you are working with network protocols, [u]intN_t is a bad choice


>> [u]intN_t may not exist.

It will, that's guaranteed by the standard. If it's not a standards compliant compiler/runtime/platform, you've got bigger problems.

>> If you are working with arrays, always use size_t if not, use [u]int_leastN_t

Sure, on array indices, size_t is appropriate.

I can't see an advantage in using 'least' as standards dictate that 8/16/32/64 bit types are available.

>> in 99% of the cases, except if you are working with network protocols, [u]intN_t is a bad choice

No, it's the best choice in most cases because it makes the code a little more explicit and easier to understand, and it makes developers think about the range of the data you're using.


The sized types are optional, since they are more restrictive than the ordinary types - see section 7.20.1.1 in the standard (http://port70.net/~nsz/c/c11/n1570.html#7.20.1.1). To summarize: intN_t is exactly N bits, with zero padding bits, and 1 2's complement sign bit; uintN_t is exactly N bits, with zero padding bits; the implementation must provide such typedefs for all such types it supports.

So, if a system is 1's complement, you won't get the intN_t types at all. If the system doesn't support a particular bit width, you won't get the types for that width.


> So, if a system is 1's complement

As someone who actually programmed a 1's complement system with end-around carry, I'm going to call you out on this.

What system still exists that can run C99 and is one's complement?


I am playing the language lawyer game - there is nothing to call out. Is what I say justified by the standard? Yes, or no? There are no other issues involved ;)

I have no idea if C99 runs on any 1's complement systems. I don't know what end-around carry is. I don't care what it is. I don't care about any of this 1's complement nonsense. But the standard says it exists, and must be taken into account, and, therefore, by the rules of the game, I am obliged to assume these things.


What if your array is never going to be larger than, say, UINT8_MAX, and you need to pack thousands of these "sizes" into an array of structs? Using an 8-byte size_t in that case will inflate the cache usage significantly.


If you are sure that this is going to be the case, then use uint_least8_t.


Assuming running time is considered a feature of your project, oversized int types on smaller hardware will adversely effect that

You can still range check the natural integer types, just use INT_MAX and suchlike


>> Assuming running time is considered a feature of your project, oversized int types on smaller hardware will adversely effect that

Either you need the range, in which case you're stuck with a large type, or you don't and can use a smaller one.

I'm not sure where 'oversized' comes into it.


Is this parody?

You've written a list of coding practices which assure job security for folks like me, who have to undo these gross portability problems, bugs, and security vulnerabilities these result in.

  for (int i = 0; i < 100000; ++i)  // most optimal int size used
Is bullshit, always.


You're wrong about the arrays, the standard guarantees that zeroing the bytes of an object of integer type will give a 0 value to the integer. Pointers are a different matter though.


In the second to last paragraph; did you hint that wrapping a char*, used as a string, into a struct, to get type safety?


Yep, you don't even need struct literals for it. Quite a few int types are wrapped this way in the Linux kernel


The initialisers bit seems downright dangerously wrong. This code:

    uint32_t array[10] = {0};
Does not initialise every element to 0 in the way it would seem to. To see the difference contrast the difference you get when you a) remove the initialiser and b) replace the initialiser with {1}.


>Does not initialise every element to 0 in the way it would seem to.

Yes it does, since atleast C99. As for the others:

a) Removing the initializer will leave the values undefined.

b) Using an initializer of { 1 } will initialize the first element to 1 and the rest to 0.


"the way it would seem to"

{} initialises all elements to 0.

{0} initialises the first element to 0 and the rest to 0.

The latter form just introduces confusion.


When you replace the initialiser with 1, it should only initialse the first element to 1. Objects initialized in this way have unmentioned elements set to 0 or NULL (recursing into aggregates, initialising the first named union member). (See C11 standard, 6.7.9.21. 6.7.9.10 gives the rules for selecting the default value. Aside from the extra syntax, I don't think the rules differ substantially from C89...)

C++ lets you do "uint32_t array[10]={}", which is something C should allow as well, really. But it doesn't.


You may find it distasteful but that has been idiomatic C for more than a decade.


I've not seen anyone use that construct anywhere.


I was going to remark on this. I've personally run into this writing some C that was built on multiple platforms, one of which had c99 and one of which did not (c89 instead).

I think the real issue here is the inconsistency between C standards on details like this. If you can always assume that your code is built with c99 or later, then use all of its features, but in many cases that's not a realistic assumption.


I think that this rule applies to C89 too. I don't have an official copy of the standard, but I found a draft here: http://port70.net/~nsz/c/c89/c89-draft.html#3.5.7

“If there are fewer initializers in a list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.”


It does if your target system properly zeros bss because, by the rules, array[10] will be placed in bss and bss is zeroed. If you replace the initializer with {1}, you force the compiler to file to a different rule that forces the compiler to explicitly initialize everything to 1.

The problem I've had (and you probably are referring to) is embedded systems that may not properly zero bss. I've also had problems in embedded systems trying to place "array[10] = {0};" into a specific non-bss section (that was really annoying).

In my experience, TFA is good practices generally, but will have problems in corner cases that you run into in deeply embedded systems.


This is incorrect, `int array[10] = {1}` initializes all but the first element with 0.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: