The reality is that C `int` is 32 bits in size. Sure, that's not true for 16 bit...

Findecanor · on Oct 9, 2023

Yet another issue is that `char` is signed on some platforms but unsigned on others. It is signed on x86 but unsigned on RISC-V. On ARM it could be either (ARM standard is unsigned, Apple does signed).

I therefore use typedefs called `byte` and `ubyte` wherever the data is 8-bit but not character data. I also use the aliases `ushort`, `uint` and `ulong` to cut down on typing. On the other hand, the types in <stdint.h> are often recognised by syntax colouring in editors where user-defined types aren't.

hasmanean · on Oct 9, 2023

Default signed/unsigned is not a platform convention but a compiler one. You can change it with a compiler switch, in the makefile.

chmod775 · on Oct 9, 2023

Then you're better off using custom types - that way people will immediate know your type is non-default - as opposed to hiding your customization away in a makefile, pranking people who expect built-ins to behave a certain way.

munch117 · on Oct 9, 2023

The people who understand that it can be either, depending on a compiler switch, are exactly the people who use an explicit sign (typically via a typedef) to ensure their code always works.

The people who say that char is de facto signed and everyone should just deal with it, are the people who end up writing broken code.

ChrisRR · on Oct 10, 2023

"might" be able to change it. It's fine in GCC based compilers, but some awkward/old compilers don't give you the luxury

WalterBright · on Oct 9, 2023

Um, it was dependent on how the CPU handled it back in those days. That problem went away, though.

WalterBright · on Oct 9, 2023

Yes, the optional sign on char is also madness. C had a chance in 1989 to make it unsigned, and muffed it. (When C86 decided between value-preserving and sign-preserving semantics, they could have also said char was unsigned, and saved generations of programmers from grief.)

D's `char` type is unsigned. Done. No more problems.

ChrisRR · on Oct 10, 2023

Oh I don't even know where to start with this. Given that C is the lingua franca of embedded development, and each processor and compiler has different opinions of what an int is, I would never claim that an int is 32 bits.

It's just so much less error prone to define a uint32_t. That's guaranteed to be the same

rurban · on Oct 9, 2023

C is also used on embedded. It's even there the default language.

ultrarunner · on Oct 9, 2023

Yeah, almost the only time I'm writing C anymore is embedded, where I want to reason about type widths (while taking on as light a cognitive load as is possible). I have enough code that gets compiled to an 8, 16, or 32 bit target depending on context that having the bit width right on the tin is valuable. And it doesn't even cost me "hours and hours".

ryandrake · on Oct 9, 2023

Also: Embedded is almost the only time you really, truly need to care about how many bits a type is, and only when you're interacting with actual hardware.

For almost every other routine task in programming, I would argue that it really doesn't matter if your int is 32 bits wide or 64 bits wide. Why go through the trouble of insisting on int32_t or int64_t? It probably doesn't matter for the things you are counting.

Some programmers will say "Well, we should use int64_t here because int32_t might overflow!" OK, so why weren't you checking for overflow if it was an expected case? int64_t might overflow too, are you checking after every operation? Probably not. "OK, let's use uint64_t then, now we get 2x as many numbers!" Now you have other overflow (and subtraction) problems to handle.

Nowadays, I just use int and move on with my life. It's one of those lessons from experience: "When I was younger, I used int and char because I didn't know any better. When I was older, I created this complex, elaborate type system because I knew better. Now that I'm wise, I just use int and char."

WalterBright · on Oct 9, 2023

> It's one of those lessons from experience: "When I was younger, I used int and char because I didn't know any better. When I was older, I created this complex, elaborate type system because I knew better. Now that I'm wise, I just use int and char."

Right on, dude. I've gone full circle on that, too.

I also spent years wandering the desert being enamored with the power of the C preprocessor. Eventually, I just ripped it out as much as possible, replacing it with ordinary C code. C is actually a decent language if you eschew the damned preprocessor.

obviouslynotme · on Oct 10, 2023

Sizing values is also great for reducing bandwidth footprint, either network or memory. The really good optimizations come from this.

terracottalite · on Oct 9, 2023

> 2x as many numbers

Minor correction, 2^32x as many numbers. Though I agree with your point.

Edit: added x to the number for consistency and clarity.

ryandrake · on Oct 9, 2023

LOL yea, that's what was in my brain but somehow 2x got typed. Good catch.

stephc_int13 · on Oct 9, 2023

Long Long is ridiculous and often confusing.

Now, what about SIMD types?

What about 16bits floats?

Using the short size convention we have easy and logical answers.

The reason why new languages like Rust and Zig are using those conventions is not random, types naming (and stdlib) is a weak point of C (and C++).

Luckily they are not set in stone, we can choose different and reasonable conventions.

WalterBright · on Oct 9, 2023

I use `halffloat` for 16 bit floats. But be careful, there are several different encodings of 16 bit floats, so float16 isn't enough in and of itself.

SIMD types in D are done with:

    __vector(byte[16]), __vector(int[8])

and an alias (typedef for the C folk) for this is commonly used, like `byte16` and `int8`.

sovande · on Oct 9, 2023

> Long Long is ridiculous and often confusing

It might be ridiculous, but it’s hardly confusing for a C programmer. But, yeah in and ideal world ‘long’ should just be defined as 64 bits

kazinator · on Oct 9, 2023

The Motorola 68K family has been targeted by C compilers configured with 16 bit int.

WalterBright · on Oct 9, 2023

"has been", i.e. obsolete

self_awareness · on Oct 9, 2023

Isn't "int" 64-bit on some (rare) architectures?

grotorea · on Oct 9, 2023

Not mentioned on the table I know https://en.cppreference.com/w/cpp/language/types

edit: Oh you're right

> Other models are very rare. For example, ILP64 (8/8/8: int, long, and pointer are 64-bit) only appeared in some early 64-bit Unix systems (e.g. UNICOS on Cray).

terracottalite · on Oct 9, 2023

My computer being one of those (rare?) architectures. Though I think it is not entirely dependent on the processor and the OS choice also affects this.

self_awareness · on Oct 10, 2023

What machine are you using?

terracottalite · on Oct 10, 2023

Umm, sorry I remembered that wrong. Turns out int isn't 64 bits on my machine. I should double check before posting next time. (I mistaked long with int, and long isn't 64 bits on some systems). I can't delete it now.

colejohnson66 · on Oct 9, 2023

Not sure about 64-bit `int`, but it is 16-bit on some 16-bit micros, such as the AVR line (used by the original Arduino).

galangalalgol · on Oct 9, 2023

Ti dsps have 48bits for long.

colejohnson66 · on Oct 13, 2023

Yep. DSPs always have weird architectures, but in most cases, one isn't compiling the same code for multiple DSP architectures. As an example, the C2000 line has a 16-bit `char`; There is no support for "bytes".

i_am_a_peasant · on Oct 9, 2023

yeah i had 48 bit pointers on one asic. upper 16 bits selected memory partition/region

sovande · on Oct 9, 2023

Exactly this (plus floating point types and unsigned qualifier) and done. It’s standard C, there is no need to invent yet another unnecessary “type” system for standard C native types. I do like bool though.

aidenn0 · on Oct 9, 2023

I swear I've seen 128-bit "long long" types.

consp · on Oct 9, 2023

Which are minima, but in practice they represent the width.

Am4TIfIsER0ppos · on Oct 9, 2023

Wait... where's plain "long"? I know, you probably know, but that is why you use explicit sizes where you can.

WalterBright · on Oct 9, 2023

I quit using "long" because sometimes a long is 32 bits and sometimes 64, and I can never remember which compiler does which. But "int" is 32 bits and "long long" is always 64 bits, so that's what I stick with.

C's "long" should not be used in new code.

kazinator · on Oct 10, 2023

The type that is 32 bits in C is int32_t, and the 64 bit one is int64_t; if you really want those specific widths, you can just use those types.

The type long is the smallest ranking basic type that is at least 32 bits wide. Since int is only required to go to 32767, you use long if you need a signed type with more range than that. That made a lot of sense on platforms where int really did go up to just 32767, and long provided the 32 bit one.

Now long, while at least 32 bits, is not required to be wider than 32; if you need a signed type that goes beyond 2147483647, then long long is it.

Those are the portability rules. Unfortunately, those rules will sometimes lead you to choose types that are wider than necessary, like long when int would have worked.

Where that matters, it's best to make your code tunable with your own typedefs. I don't mean typedefs like i32 but abstract ones, like ISO C's time_t or clock_t, or POSIX's pid_t. You can adjust your types without editing numerous lines of code.

kazinator · on Oct 9, 2023

Choosing integer sizes in C is pretty easy. The standard guarantees certain minimum ranges.

1. Consider the char and short types only if saving storage is important. Do not declare "char number_of_wheels" for a car, just because no car has anywhere near 127 wheels, unless it is really important to get it down to one byte.

2. Prefer signed types to unsigned types, when saving storage is not important. Unsigned types bend the rules of arithmetic around zero, and mixtures of signed and unsigned arithmetic add complexity and pitfalls. Do use unsigned for bitmasks and bitfields.

3. Two's complement is ubiquitous: feel free to assume that signed char gives you -128, and short gives you -32768, etc. ISO C now requires two's complement.

3. Use the lowest ranking type whose range is adequate, in light of the above rules: rule out the chars and shorts, and unsigned types, unless saving space or working with bits.

For instance, for a value that ranges from 0 to 65535, we would choose int. If it were important to save storage, then unsigned short.

The ISO C minimum required ranges are:

  char                 0..255, if unsigned; -128..127 if unsigned, therefore: 0..127
  signed char          -128..127
  unsigned char        0..255
  short               -32768..32767
  unsigned short       0..65535
  int                  -32768..32767
  unsigned int         0..65535
  long                  -2147483648..2147483647
  unsigned long        0..4294967295
  long long            9223372036854775808..9223372036854775807
  unsigned long long   0..18446744073709551615

If you're working with bitfields, and saving storage isn't important, start with unsigned int, and pick the type that holds all the bits required. For arrays of bitfields, prefer unsigned int; it's likely to be fast on a given target. It's good to leave that configurable the program. E.g. a good "bignum" library can easily be tuned to have "limbs" of different sizes: 16, 32 or 64 bit, and mostly hides that at the API level.

If you're working with a numeric quantity, remove the unsigned types, shorts and chars, unless you need to save storage (and don't need negative values). Then pick the lowest ranking one that fits.

E.g. if saving storage, and don't need negative values, search in this order: char, signed char, unsigned char, short, unsigned short, long, unsigned long, long long, unsigned long long.

If saving storage, and negatives are required: signed char, short, int, long, long long.

If not saving storage: int, long, long long.

If the quantity is positive, and doesn't fit into long long, but does fit into unsigned long long, that's what it may have to be.

epcoa · on Oct 9, 2023

Unsigned doesn’t really bend any rules and their behavior around wraparound is well defined.

Therefore there is another use case : circular buffer indices.

kazinator · on Oct 9, 2023

Yes it does bend rules. Say that a, b and c are small integers (we don't worry about addition overflow). Given an inequality formula like:

       a < b + c

we can safely perform this derivation (add -b to both sides):

   a - b < c

This is not true if a, b and c are unsigned. Or even if just one of them is, depending on which one.

What I mean by "bend the rules of arithmetic" is that if we decrement from zero, we suddenly get a large value.

This is rarely what you want, except in specific circumstances, when you opt into it.

Unsigned tricks with circular buffer indices will not do the right thing unless the circular buffer is power-of-two sized.

Using masking on a poweer-of-two-sized index will work with signed, due to the way two's complement works. For instance, say we hava have [0] to [15] circular buffer. The mask is 15 / 0xF. A negative index like -2 masks to the correct value 14: -2 & 15 == 14. So if we happen to be decrementing we can do this: index = (index - 1) & MASK even if index is int.

epcoa · on Oct 9, 2023

> What I mean by "bend the rules of arithmetic" is that if we decrement from zero, we suddenly get a large value.

Yes completely consistent with rules of modular arithmetic. A programmer ought to be able to extend math horizons beyond preschool. Which is ironic because I can explain this concept to my 6 year old on a clock face and it’s easy for them to grasp.

> Unsigned tricks with circular buffer indices will not do the right thing unless the circular buffer is power-of-two sized.

How will they “not do the right thing?”. With power of 2 you avoid expensive moduli operations, but nothing breaks if you choose to use a non power of 2.

> two's complement

Two’s complement is not even mandated in C. You are invoking implementation defined behavior here. Meanwhile I can just increment or decrement the unsigned value without even masking the retained value and know the result is well defined.

Like I get 2s complement is the overwhelming case, but why be difficult, why not just use the well defined existing mechanism?

And there’s no tricks here, literally just using the fucking type as it was designed and specified, why clutter things with extra masking.

There’s also the pragmatic atomicity benefit.

kazinator · on Oct 9, 2023

In the N3096 working draft it is written: "The sign representation defined in this document is called two’s complement. Previous revisions of this document additionally allowed other sign representations."

Non-two's complement machines are museum relics, and are no longer going to be supported by ISO C.

> why clutter things with extra masking.

Because even if the circular buffer is a power of two, its size doesn't necessarily line up with the range of a given unsigned type.

If the buffer doesn't have a width of 256, 65536, or 4294967296, then you're out of luck; you can't just uint8_t, uint16_t or uint32_t as the circular buffer index without masking to the actual power-of-two size.

(Note that uint16_t and uint8_t promote to int (on the overwhelming majority of platforms where their range fits into that type), so you don't get away from reasoning about signed arithmetic for those.)

epcoa · on Oct 9, 2023

> If the buffer doesn't have a width of 256, 65536, or 4294967296, then you're out of luck

Why so much hyperbole? You’re not out of luck. You can atomic increment/add the unsigned no matter the buffer size. You don’t worry about overflow like you would with a signed type. You can mask after.

And you continue to avoid answering the simple question: what is the advantage of the signed type. I’ve already outlined the one with unsigned, especially with atomics.

kazinator · on Oct 9, 2023

Comment said "why clutter with extra masking" (just use the unsigned types).

Although unsigned types have no overflow, running to them as some sort of safe refuge is a mistaken knee-jerk reaction.

epcoa · on Oct 9, 2023

So then what is the advantage of using a signed type in this case?

And C++20 already standardized it I know that I already acknowledged this.

Should I go back and rewrite all the old correct code so you feel better?

kazinator · on Oct 9, 2023

The main advantage is not foisting unsigned on the user of the API.

(You can do that while using unsigned internally, but then you have to convert back and forth.)

The most important decision is what is the index type at the API level of the circular buffer, not what is inside it. But it's nicer if you can just use the API one inside.

The sizeof operator yielding the type size_t which is unsigned has done a lot of harm. Particularly the way it spread throughout the C library. Why do we have size_t being unsigned? Because on small systems, where we have 16 bit sizes, signed means limiting to 32767 bytes, which is a problem. In all other ways, it's a downer. Whenever you mention sizeof, you have unsigned arithmetic creeping into the calculation.

The author of the above blog article has the right idea to want a sizeof operator that yields ptrdiff_t instead of size_t. (Unfortunately, the execution is bungled; he redefined a language keyword as a macro, and on top of that didn't wrap the macro expansion in parentheses, even.)

xigoi · on Oct 10, 2023

> Yes completely consistent with rules of modular arithmetic.

In modular arithmetic, there is no such thing as <. (To put it precisely, ℤ_𝑛 is not an ordered ring.) Or are you teaching your 6-year old that 9:00 today is later than 7:00 tomorrow?

kazinator · on Oct 10, 2023

Unsigned arithmetic is useful for wrapping clocks, like interrupt tick counters and whatnot. There is always some current value, "now". There is a range of it defined as the future. Everything outside of that range is considered past. Timers are never set farther into the future beyond the range, and are expired in a timely way so that unexpired timers never recede sufficiently far into the past that they appear to flip to the future. One way of doing it is to just cut the range in half: take the difference between two times t1 - t0 and cast it to the same-sized signed type. If the difference is positive, then t1 is in the future relative to t0. If negative, t1 is in the past relative to t0.

This is one of those niche uses of unsigned.

You probably want to hide it behind an API, where the domain is opaque and abstract and you have function such as a time_before(t1, t0) predicate.

serpent_skis · on Oct 10, 2023

> This is not true if a, b and c are unsigned

True, but this is not valid if they are signed, either. Take

  a = INT_MIN
  b = 1
  c = 2

Then

  a < b + c

is true. But

  a - b < c

invokes undefined behavior.

Edit: missed

> Say that a, b and c are small integers (we don't worry about addition overflow)

Ah, well that makes this example vacuously true, however I'm not sure what the utility in that restriction is. We've only moved the goalposts from "bend[ing] the rules of arithmetic around zero" to bending the rules of arithmetic outside of "small integers".

kazinator · on Oct 10, 2023

> however I'm not sure what the utility in that restriction is.

We have moved the goalposts much farther apart.

If we are using a 32 bit integer type, all we need is that a, b and c fit into 31 bits. Then there is no way that b + c or a - b overflow. For a single addition or subtraction, we just need one bit of headroom.

I.e. the values do not actually have to be that small.

There are all kinds of situations in which programs work with small integers, where the calculations could bork if an unsigned creeps in.

A cliff near zero is qualitatively different from clipping at two extremes. An electronic device that clips everything below zero volts will distort even the faintest waveform. One that clips near the power rails has clean headroom.

sokoloff · on Oct 10, 2023

If b = 0x7fffffff and c = 0x7fffffff, b and c both fit in 31 bits, and b + c overflows to -2 in signed int32 twos-complement math (I think).

If b = 0x40000000 and c = 0x40000000, b and c both fit in 31 bits, and b + c overflows to -2147483648 in signed int32 twos-complement math (I think).

Maybe the definition of "32 bit integer type" you're using is meant to encompass only 32 bits as all unsigned (but then there are a - b terms that would overflow if b > a).

Or perhaps I've gotten something else wrong.

kazinator · on Oct 10, 2023

> b and c both fit in 31 bits

They don't fit into a 31 bit two's complement (i.e. signed) representation, in terms of representing their interpretation as the familiar 32 bit INT_MAX.

31 bit two's complement goes from -0x40000000 to 0x3FFFFFFF. There is a 0x7FFFFFFF bit pattern, which represents -0x00000001. It has a sign bit which is 1. (So, adding that to itself does go to -2, but under that interpretation there is no overflow.)

Any pair of values in that range can be added or subtracted in 32 bit two's complement.

Including the most negative value: -0x40000000 + -0x40000000 = -0x80000000.

_wf2l · on Oct 9, 2023

so... what I'm seeing is that C got it wrong relative to the way things actually work and get used.

the fact that you had to have tribal knowledge about all of this is why C shouldn't stay for the long term and we should phase out languages into ones with stronger more correct defaults.

would a new programmer use "long long"? would they notice immediately that things didn't work if they didn't use it?

Rust got it correct by labeling the bits with the type directly

lelanthran · on Oct 9, 2023

You realise that C had labeled types long before Rust was conceived?

kazinator · on Oct 9, 2023

Rust's integer types are poorly abstracted. The use of specifically sized types for quantities that are not related to hardware is comically ridiculous.

In the C world, only the goofballs do things like use char or int8_t for the number of children in a family, or wheels on a car.

yet that is what Rust code looks like. Almost every Rust code sample I've ever seen sets off my bozon detector just for this reason.