Sure, that's not true for 16 bit targets. But are you really going to port a 5Mb program to 16 bits? It's not worth worrying about. Your code is highly unlikely to be portable to 16 bits anyway.
The problem is with `long`, which is 32 bits on some machines and 64 bits on others. This is just madness. Fortunately, `long long` is always 64 bits, so it makes sense to just abandon `long`.
So there it is:
char - 8 bits
short - 16 bits
int - 32 bits
long long - 64 bits
Done!
(Sheesh, all the endless hours wasted on the size of an `int` in C.)
Yet another issue is that `char` is signed on some platforms but unsigned on others. It is signed on x86 but unsigned on RISC-V. On ARM it could be either (ARM standard is unsigned, Apple does signed).
I therefore use typedefs called `byte` and `ubyte` wherever the data is 8-bit but not character data.
I also use the aliases `ushort`, `uint` and `ulong` to cut down on typing.
On the other hand, the types in <stdint.h> are often recognised by syntax colouring in editors where user-defined types aren't.
Then you're better off using custom types - that way people will immediate know your type is non-default - as opposed to hiding your customization away in a makefile, pranking people who expect built-ins to behave a certain way.
The people who understand that it can be either, depending on a compiler switch, are exactly the people who use an explicit sign (typically via a typedef) to ensure their code always works.
The people who say that char is de facto signed and everyone should just deal with it, are the people who end up writing broken code.
Yes, the optional sign on char is also madness. C had a chance in 1989 to make it unsigned, and muffed it. (When C86 decided between value-preserving and sign-preserving semantics, they could have also said char was unsigned, and saved generations of programmers from grief.)
D's `char` type is unsigned. Done. No more problems.
Oh I don't even know where to start with this. Given that C is the lingua franca of embedded development, and each processor and compiler has different opinions of what an int is, I would never claim that an int is 32 bits.
It's just so much less error prone to define a uint32_t. That's guaranteed to be the same
Yeah, almost the only time I'm writing C anymore is embedded, where I want to reason about type widths (while taking on as light a cognitive load as is possible). I have enough code that gets compiled to an 8, 16, or 32 bit target depending on context that having the bit width right on the tin is valuable. And it doesn't even cost me "hours and hours".
Also: Embedded is almost the only time you really, truly need to care about how many bits a type is, and only when you're interacting with actual hardware.
For almost every other routine task in programming, I would argue that it really doesn't matter if your int is 32 bits wide or 64 bits wide. Why go through the trouble of insisting on int32_t or int64_t? It probably doesn't matter for the things you are counting.
Some programmers will say "Well, we should use int64_t here because int32_t might overflow!" OK, so why weren't you checking for overflow if it was an expected case? int64_t might overflow too, are you checking after every operation? Probably not. "OK, let's use uint64_t then, now we get 2x as many numbers!" Now you have other overflow (and subtraction) problems to handle.
Nowadays, I just use int and move on with my life. It's one of those lessons from experience: "When I was younger, I used int and char because I didn't know any better. When I was older, I created this complex, elaborate type system because I knew better. Now that I'm wise, I just use int and char."
> It's one of those lessons from experience: "When I was younger, I used int and char because I didn't know any better. When I was older, I created this complex, elaborate type system because I knew better. Now that I'm wise, I just use int and char."
Right on, dude. I've gone full circle on that, too.
I also spent years wandering the desert being enamored with the power of the C preprocessor. Eventually, I just ripped it out as much as possible, replacing it with ordinary C code. C is actually a decent language if you eschew the damned preprocessor.
> Other models are very rare. For example, ILP64 (8/8/8: int, long, and pointer are 64-bit) only appeared in some early 64-bit Unix systems (e.g. UNICOS on Cray).
My computer being one of those (rare?) architectures. Though I think it is not entirely dependent on the processor and the OS choice also affects this.
Umm, sorry I remembered that wrong. Turns out int isn't 64 bits on my machine. I should double check before posting next time. (I mistaked long with int, and long isn't 64 bits on some systems). I can't delete it now.
Yep. DSPs always have weird architectures, but in most cases, one isn't compiling the same code for multiple DSP architectures. As an example, the C2000 line has a 16-bit `char`; There is no support for "bytes".
Exactly this (plus floating point types and unsigned qualifier) and done. It’s standard C, there is no need to invent yet another unnecessary “type” system for standard C native types. I do like bool though.
I quit using "long" because sometimes a long is 32 bits and sometimes 64, and I can never remember which compiler does which. But "int" is 32 bits and "long long" is always 64 bits, so that's what I stick with.
The type that is 32 bits in C is int32_t, and the 64 bit one is int64_t; if you really want those specific widths, you can just use those types.
The type long is the smallest ranking basic type that is at least 32 bits wide. Since int is only required to go to 32767, you use long if you need a signed type with more range than that. That made a lot of sense on platforms where int really did go up to just 32767, and long provided the 32 bit one.
Now long, while at least 32 bits, is not required to be wider than 32; if you need a signed type that goes beyond 2147483647, then long long is it.
Those are the portability rules. Unfortunately, those rules will sometimes lead you to choose types that are wider than necessary, like long when int would have worked.
Where that matters, it's best to make your code tunable with your own typedefs. I don't mean typedefs like i32 but abstract ones, like ISO C's time_t or clock_t, or POSIX's pid_t. You can adjust your types without editing numerous lines of code.
Choosing integer sizes in C is pretty easy. The standard guarantees certain minimum ranges.
1. Consider the char and short types only if saving storage is important. Do not declare "char number_of_wheels" for a car, just because no car has anywhere near 127 wheels, unless it is really important to get it down to one byte.
2. Prefer signed types to unsigned types, when saving storage is not important. Unsigned types bend the rules of arithmetic around zero, and mixtures of signed and unsigned arithmetic add complexity and pitfalls. Do use unsigned for bitmasks and bitfields.
3. Two's complement is ubiquitous: feel free to assume that signed char gives you -128, and short gives you -32768, etc. ISO C now requires two's complement.
3. Use the lowest ranking type whose range is adequate, in light of the above rules: rule out the chars and shorts, and unsigned types, unless saving space or working with bits.
For instance, for a value that ranges from 0 to 65535, we would choose int. If it were important to save storage, then unsigned short.
The ISO C minimum required ranges are:
char 0..255, if unsigned; -128..127 if unsigned, therefore: 0..127
signed char -128..127
unsigned char 0..255
short -32768..32767
unsigned short 0..65535
int -32768..32767
unsigned int 0..65535
long -2147483648..2147483647
unsigned long 0..4294967295
long long 9223372036854775808..9223372036854775807
unsigned long long 0..18446744073709551615
If you're working with bitfields, and saving storage isn't important, start with unsigned int, and pick the type that holds all the bits required. For arrays of bitfields, prefer unsigned int; it's likely to be fast on a given target. It's good to leave that configurable the program. E.g. a good "bignum" library can easily be tuned to have "limbs" of different sizes: 16, 32 or 64 bit, and mostly hides that at the API level.
If you're working with a numeric quantity, remove the unsigned types, shorts and chars, unless you need to save storage (and don't need negative values). Then pick the lowest ranking one that fits.
E.g. if saving storage, and don't need negative values, search in this order: char, signed char, unsigned char, short, unsigned short, long, unsigned long, long long, unsigned long long.
If saving storage, and negatives are required: signed char, short, int, long, long long.
If not saving storage: int, long, long long.
If the quantity is positive, and doesn't fit into long long, but does fit into unsigned long long, that's what it may have to be.
Yes it does bend rules. Say that a, b and c are small integers (we don't worry about addition overflow). Given an inequality formula like:
a < b + c
we can safely perform this derivation (add -b to both sides):
a - b < c
This is not true if a, b and c are unsigned. Or even if just one of them is, depending on which one.
What I mean by "bend the rules of arithmetic" is that if we decrement from zero, we suddenly get a large value.
This is rarely what you want, except in specific circumstances, when you opt into it.
Unsigned tricks with circular buffer indices will not do the right thing unless the circular buffer is power-of-two sized.
Using masking on a poweer-of-two-sized index will work with signed, due to the way two's complement works. For instance, say we hava have [0] to [15] circular buffer. The mask is 15 / 0xF. A negative index like -2 masks to the correct value 14: -2 & 15 == 14. So if we happen to be decrementing we can do this: index = (index - 1) & MASK even if index is int.
> What I mean by "bend the rules of arithmetic" is that if we decrement from zero, we suddenly get a large value.
Yes completely consistent with rules of modular arithmetic. A programmer ought to be able to extend math horizons beyond preschool. Which is ironic because I can explain this concept to my 6 year old on a clock face and it’s easy for them to grasp.
> Unsigned tricks with circular buffer indices will not do the right thing unless the circular buffer is power-of-two sized.
How will they “not do the right thing?”. With power of 2 you avoid expensive moduli operations, but nothing breaks if you choose to use a non power of 2.
> two's complement
Two’s complement is not even mandated in C. You are invoking implementation defined behavior here. Meanwhile I can just increment or decrement the unsigned value without even masking the retained value and know the result is well defined.
Like I get 2s complement is the overwhelming case, but why be difficult, why not just use the well defined existing mechanism?
And there’s no tricks here, literally just using the fucking type as it was designed and specified, why clutter things with extra masking.
In the N3096 working draft it is written: "The sign representation defined in this document is called two’s complement. Previous revisions of this document
additionally allowed other sign representations."
Non-two's complement machines are museum relics, and are no longer going to be supported by ISO C.
> why clutter things with extra masking.
Because even if the circular buffer is a power of two, its size doesn't necessarily line up with the range of a given unsigned type.
If the buffer doesn't have a width of 256, 65536, or 4294967296, then you're out of luck; you can't just uint8_t, uint16_t or uint32_t as the circular buffer index without masking to the actual power-of-two size.
(Note that uint16_t and uint8_t promote to int (on the overwhelming majority of platforms where their range fits into that type), so you don't get away from reasoning about signed arithmetic for those.)
> If the buffer doesn't have a width of 256, 65536, or 4294967296, then you're out of luck
Why so much hyperbole?
You’re not out of luck. You can atomic increment/add the unsigned no matter the buffer size. You don’t worry about overflow like you would with a signed type. You can mask after.
And you continue to avoid answering the simple question: what is the advantage of the signed type. I’ve already outlined the one with unsigned, especially with atomics.
The main advantage is not foisting unsigned on the user of the API.
(You can do that while using unsigned internally, but then you have to convert back and forth.)
The most important decision is what is the index type at the API level of the circular buffer, not what is inside it. But it's nicer if you can just use the API one inside.
The sizeof operator yielding the type size_t which is unsigned has done a lot of harm. Particularly the way it spread throughout the C library. Why do we have size_t being unsigned? Because on small systems, where we have 16 bit sizes, signed means limiting to 32767 bytes, which is a problem. In all other ways, it's a downer. Whenever you mention sizeof, you have unsigned arithmetic creeping into the calculation.
The author of the above blog article has the right idea to want a sizeof operator that yields ptrdiff_t instead of size_t. (Unfortunately, the execution is bungled; he redefined a language keyword as a macro, and on top of that didn't wrap the macro expansion in parentheses, even.)
> Yes completely consistent with rules of modular arithmetic.
In modular arithmetic, there is no such thing as <. (To put it precisely, ℤ_𝑛 is not an ordered ring.) Or are you teaching your 6-year old that 9:00 today is later than 7:00 tomorrow?
Unsigned arithmetic is useful for wrapping clocks, like interrupt tick counters and whatnot. There is always some current value, "now". There is a range of it defined as the future. Everything outside of that range is considered past. Timers are never set farther into the future beyond the range, and are expired in a timely way so that unexpired timers never recede sufficiently far into the past that they appear to flip to the future. One way of doing it is to just cut the range in half: take the difference between two times t1 - t0 and cast it to the same-sized signed type. If the difference is positive, then t1 is in the future relative to t0. If negative, t1 is in the past relative to t0.
This is one of those niche uses of unsigned.
You probably want to hide it behind an API, where the domain is opaque and abstract and you have function such as a time_before(t1, t0) predicate.
True, but this is not valid if they are signed, either. Take
a = INT_MIN
b = 1
c = 2
Then
a < b + c
is true. But
a - b < c
invokes undefined behavior.
Edit: missed
> Say that a, b and c are small integers (we don't worry about addition overflow)
Ah, well that makes this example vacuously true, however I'm not sure what the utility in that restriction is. We've only moved the goalposts from "bend[ing] the rules of arithmetic around zero" to bending the rules of arithmetic outside of "small integers".
> however I'm not sure what the utility in that restriction is.
We have moved the goalposts much farther apart.
If we are using a 32 bit integer type, all we need is that a, b and c fit into 31 bits. Then there is no way that b + c or a - b overflow. For a single addition or subtraction, we just need one bit of headroom.
I.e. the values do not actually have to be that small.
There are all kinds of situations in which programs work with small integers, where the calculations could bork if an unsigned creeps in.
A cliff near zero is qualitatively different from clipping at two extremes. An electronic device that clips everything below zero volts will distort even the faintest waveform. One that clips near the power rails has clean headroom.
If b = 0x7fffffff and c = 0x7fffffff, b and c both fit in 31 bits, and b + c overflows to -2 in signed int32 twos-complement math (I think).
If b = 0x40000000 and c = 0x40000000, b and c both fit in 31 bits, and b + c overflows to -2147483648 in signed int32 twos-complement math (I think).
Maybe the definition of "32 bit integer type" you're using is meant to encompass only 32 bits as all unsigned (but then there are a - b terms that would overflow if b > a).
They don't fit into a 31 bit two's complement (i.e. signed) representation, in terms of representing their interpretation as the familiar 32 bit INT_MAX.
31 bit two's complement goes from -0x40000000 to 0x3FFFFFFF. There is a 0x7FFFFFFF bit pattern, which represents -0x00000001. It has a sign bit which is 1. (So, adding that to itself does go to -2, but under that interpretation there is no overflow.)
Any pair of values in that range can be added or subtracted in 32 bit two's complement.
Including the most negative value: -0x40000000 + -0x40000000 = -0x80000000.
so... what I'm seeing is that C got it wrong relative to the way things actually work and get used.
the fact that you had to have tribal knowledge about all of this is why C shouldn't stay for the long term and we should phase out languages into ones with stronger more correct defaults.
would a new programmer use "long long"? would they notice immediately that things didn't work if they didn't use it?
Rust got it correct by labeling the bits with the type directly
Rust's integer types are poorly abstracted. The use of specifically sized types for quantities that are not related to hardware is comically ridiculous.
In the C world, only the goofballs do things like use char or int8_t for the number of children in a family, or wheels on a car.
yet that is what Rust code looks like. Almost every Rust code sample I've ever seen sets off my bozon detector just for this reason.
Sure, that's not true for 16 bit targets. But are you really going to port a 5Mb program to 16 bits? It's not worth worrying about. Your code is highly unlikely to be portable to 16 bits anyway.
The problem is with `long`, which is 32 bits on some machines and 64 bits on others. This is just madness. Fortunately, `long long` is always 64 bits, so it makes sense to just abandon `long`.
So there it is:
Done!(Sheesh, all the endless hours wasted on the size of an `int` in C.)