I worked with C for nearly 20 years, and it is everything but simple. It is complicated, not "low level" as many thinks, full of weird edge cases, compilers will happily compile non standard code, fragile (hard to refactor), full of implicit conversion you didn't expect...
Rust has a good design, it still have a few rough edges but the syntax is great and improving.
One can make the argument that the C declaration syntax is simple, because the rule to make a declaration is to simply follow a type name by an expression where the declared variable is used. The fact that you write "int[4] arr" shows that you don't know how it works (which is not a criticism; it's just not well known how it works).
The correct way is to write
int arr[4];
and to interpret it as "arr[4] is an int" (which is only a slight lie because arr[4] is undefined if arr is a 4-element array).
How do you declare an array of pointers? Again, you write
int *arr[4]; // array of 4 pointers to ints
because in C expression syntax, "* arr[4]" means to index into the array first and then to dereference. If that is an int, it means that arr[4] is a pointer to an int, and consequently arr is an array of pointers to ints.
If you want a pointer to an array of ints instead, do this
int (*arr)[4]; // pointer to an array of 4 ints
again, because that's how regular C expressions work. Functions are (mostly) not an exception:
int myfunc(int x);
int (*myptr)(int x);
which is to say that "myfunc(x) is an int" and "(* myptr)(x) is an int", i.e. myptr is a pointer to a function that takes an int and returns an int.
Note that in the beginning (i.e. K&R C, pre-1989) the way to declare functions was consistent: Declarations had to be
int myfunc(x);
i.e. there was no types in the argument lists. The types in the argument list appeared, I believe, after Stroustroup added them to C++ in order to improve type-safety.
> the rule to make a declaration is to simply follow a type name by an expression where the declared variable is used.
Not really. You can't just use _an_ expression, you have to use a specific expression. For example, * ppX is a perfect valid expression for a pointer to a pointer named ppX, as in:
int **ppX;
if (*ppX == NULL)
So you need to use an expression where the declared variable is used that results in a non-pointer type. And then, there's actually more to it... only certain types of expressions are valid. E.g. this isn't a valid declaration for a pointer, even though it's valid as an expression:
int pX->;
I find that basically anytime somebody tells me that C rules are to "simply [...]", they've inevitably ignored a whole bunch of cases. Your post is no exception.
> So you need to use an expression where the declared variable is used that results in a non-pointer type.
I think you are wrong here, you can't even make something other than what "results in a non-pointer type". Because by definition, you're making the type to the left with the expression. (That could still be a pointer type if it is a typedef'ed type; such as "typedef int * intptr; intptr x;" but I don't think you meant that by "pointer type" [0]).
And your first example is perfectly syntactically valid (other than missing the conditional statement that must follow the if-condition).
And no, "pX->" is not a valid expression. Was that a typo?
And yes, only a subset of expressions are valid. Basically, the expressions that you can form by applying subscripts, (x[3]), dereferences (* x), and function calls. Because, a declaration like "int x + 3;" or even "int x + y;" just doesn't make sense. I don't see a problem there.
Btw. I'm not saying that C as by the current standards is super straightforward and pure. It's definitely not, and C does actually have a lot of historical baggage that makes our lives a little harder. I'm just explaining the underlying unifying principle, which IMHO is actually nice. And honestly it seems you, too, are still confused because there is just a lack of clear explanations about C declarations. That principle should be much more well-known, and almost all problems that novices have with declarations are unnecessary frustration that they wouldn't have if someone would have told them the trick.
[0] By the way, typedef is another thing that seems to be super obscure, while it is extremely simple: It's just a keyword that modifies declarations to declare an alias for that type, instead of a (named) variable of that type.
> The correct way is to write
> int arr[4];
> and to interpret it as "arr[4] is an int"
No. The "interpretation" is "arr is an array of ints", and that is its type.
The complexity of this declaration is evidenced by the fact that you spend another page of text "randomly" adding characters and delimiters around variable declarations to change its type:
int arr[4] // array of ints
int *arr[4] // array of pointers to int
int (*arr)[4] // pointer to array of ints
These are all changes to type. Yet, instead of changing the type declaration, a bunch of stuff is added all around the variable. And you have to come up with ridiculous explanations like "arr[4] is an int which makes arr an array of ints".
That's exactly why most languages said: "if we're changing the type, we're going to reflect this in the type". In a better world the examples above would be something like
int[4] arr; // array of ints
*int[4] arr; // array of pointers to int
*(int[]) arr; // pointer to an array of ints
> The correct way is to write > int arr[4]; > and to interpret it as "arr[4] is an int"
>> No. The "interpretation" is "arr is an array of ints", and that is its type.
I would have been happier if I had found my explanation interpreted in a more generous way. But that's basically what I was saying (and literally what I was saying in another comment).
As to the rest, the advantage of the C approach to type declarations is that there is no type declaration syntax. Just expression syntax. And that it's very terse.
> That's exactly why most languages said: "if we're changing the type, we're going to reflect this in the type". In a better world the examples above would be something like
There's a problem in that your proposed syntax is not even properly parseable. How would a parser recognize that your lines start with types i.e. are variable declarations? For example the example "* (int[]) arr", it would start reading the asterisk and the opening parenthese as an expression, and then suddenly find a type name (int), and could then not throw an error if it was one, but had to start all over again and try to parse the whole thing as a type declaration. That's not exactly nice - good syntax is parseable with a single token of lookahead. That not only makes parser implementations easier, but is also easier to read for humans and leads to better error detection.
Apart from that I think that your examples are about what D does, and this stuff is WORSE in my opinion. While the real problem with C declarations, which is the need to thread a symbol table through the lexer/parser, is still existent in D syntax (I believe), it introduces other problems:
How do you use an array that was declared as "int[5][10] arr"? Using it as "arr[4][9]" is an error: it must be "arr[9][4]". In other words, your approach to type declarations requires the programmer to constantly turn around declarations in his/her mind, which leads to lots of mistakes. It gets even harder when you add pointers / functions, for example "int* [5][10] arr" I believe you must access as "* arr[9][4]", or whatever the D dereference syntax is.
Java can afford to let you declare "int[][] arr = new int[5][10]" and let you access "arr[4][9]", at the cost of cheating. Java can "turn around" the dimensions because it doesn't actually have an "algebraic" type syntax, which it doesn't need because it doesn't have pointers / function pointers so there is no interaction there.
That's one of the reasons why most newer languages have the type to the right of the variable name, and types grow to the left (towards the variable name). For example, "let arr: [5][10]int" you can access as "arr[5][10]" which is easier, but that principled approach to syntactic construction of types also puts requirements on the expression syntax: For example, "let arr: [5][10]* int" would have to be accessed as "* arr[5][10]", which is weird - or the expression syntax must be changed to use a postfix dereference operator.
In short, it's not as easy as you thought, and the C syntax is in fact pretty smart. And from a practical standpoint I prefer the C way very much because it's so much terser and has less punctuation than all the alternatives. The only thing that annoys me is the lexer hack.
http://c-faq.com/decl/spiral.anderson.html Is the rule I learned to understand C declarations and while the rule there is described as simple I think the examples even without argument types are actually fairly complex.
It also seems telling that no recent language has followed C’s example for declaration style, which is more implicit than explicit.
The "spiral rule" doesn't get at the heart of declarations. It's just by some guy that tried to figure it out on his own, and what he discovered was basically not declarations but the precedence rules of C expressions ;-)
> It also seems telling that no recent language has followed C’s example for declaration style, which is more implicit than explicit.
Actually most languages don't let the user do what C declarations let you do. For example, in Java (almost) everything is an object, and you can't just create a triple-indirected pointer. So, these languages can afford a declaration syntax that is less potent.
And then there are other more systems-oriented languages that chose to not copy C declarations. They come with their own gotchas. As examples I will pick D and Rust.
In D, you create a multi-dimension array like this: int[5][10] arr; Leading you to believe that you can use it as arr[4][9]; Wrong. That's an out-of-bounds error. You need to write arr[9][4]. Now, was that totally not confusing? The alternative is to expand these types systematically to the left, i.e. write [10][5]int, and maybe move the type to the right of the variable name, as in "let arr [10][5]int;". Honestly I don't like that either.
I've never really used Rust (either), but its downside, in my opinion, is that it has much more distracting syntax / punctuation.
I would love if there was a uniformly better way to declare things than the C way, but I still think C has the best tradeoffs for my practical work. The next time that I toy with language design I might try to simply go with C declarations, prefixed with a sigil or "let" or something, to remove the need for the lexer hack.
The C approach makes the compiler much more complex, and introduces extra typing in other language constructs. (like parens around if statements) This is why many newer languages do something more like the Rust way. Overall it is simpler for programmer and compiler.
It's unfortunate that the simple (and easy) underlying principles are not well known. See my other comment.
The bigger reason why recent languages have different declaration syntax is to avoid the need to carry a symbol table during parsing, and to avoid the need to parse all files serially instead of independently. Because to recognize a declaration the parser has to know which words correspond to types in the current scope.
I don't think parentheses around if-conditions are related to C variable declarations (if that's what you were saying), and I think it's fair to say that C's syntactical terseness is unmatched.
The parentheses are required to separate the condition from the following conditional statement.
How about `let x = 42` and let type inference do the work? I used to do a lot of C-style variable declarations in other languages but I'm warmed up to Rust's really fast because most of the time I don't need to explicitly name the type.
Whether a type should come before or after the name is a fairly subjective matter. I believe that after is generally superior, especially when the type is optional.
But there’s a very practical reason for requiring the `let` token: it makes parsing very much easier. With `let`, you can keep a LL(1) grammar, because seeing `let` tells you to next parse a pattern, then if there’s a colon a type after that. But if you don’t put something in there, you get a genuinely intractable problem once type grammar is not trivial: sure, `int foo;` is simple and obvious, but what about `A<B, C> d;`? should that be parsed as an expression (respaced, `A < B, C > d;`). Some languages have not resolved this style of parsing ambiguity at all, and figure it out at runtime, based on what else they find (Perl is infamous for this). I think others only kind-of resolve it, by looking at what symbols are present at compile time, to decide what was meant. Others just declare that such ambiguities are parsed one way, and you can rewrite your code (e.g. add parentheses) if you want to mean the other. Still others have resolved it otherwise, by other more subtle syntactic means, so that even if you need arbitrary look-ahead while parsing, there’s not quite any overlap between the two syntaxes (e.g. don’t support commas in this way as a kind of alternative to semicolon within expressions; or use proper matched delimiters like [] or () for generics).
Rust chooses to make parsing simple, which benefits humans as well as machines, reducing cognitive requirements in reading code.
Furthermore, in Rust what follows `let` is not an identifier or identifier list, but rather a pattern. Imagine the following contrived example:
let x = [[0]];
type x = [u32; 1];
let a = 0;
let [a]: x = [1];
That falls over completely if you put the type first: `x [a] = [1];`—does that define a new binding a with value 1, or does it set x[0] to [1]?
And finally, as I mentioned, the type is optional, and not commonly required, so you end up with something like C++’s `auto` keyword, which is basically `let` but spelled worse (and with worse semantics).
The end result is that for Rust specifically, what you desire is quite unsuitable, and what it has works very well—and that its reasons for doing things that way are well worth while considering.
C is the opposite of simple. If you want to create a binding in C, you need to learn multiple, unnecessarily complex rules. Sure, creating a binding to an int is easy:
int foo = 42;
but doing the same thing for pointers to arrays or function pointers is not:
void(foo)(int) = bar; // function pointer
int (foo)[N] = baz; // pointer to array
OTOH in Rust you just need to learn one rule: bindings are created with the grammar "PATTERN [: TYPE]" ([] means the ":TYPE" is optional.
That's the only rule you need to know: (1) it works consistently everywhere in the language (let, match, for, if-let, function arguments, while-let...), (2) it lets you create bindings, and (3) it gives you pattern matching and destructuring for free. For example,
let x: i32 = 42;
let y: fn(i32) = foo;
let z: \const [i32] = bar;
but also:
struct Entity { id: i32, vel: (f32, f32) }
let e: Entity; // given
let Entity { vel .. } = e; // vel points to e.vel
let Entity { vel: (v_x, ..), .. } = e; // v_x points to e.vel.x;
match entity {
Entity { vel: (v_x, ..), .. } if v_x > 0 => {
// do something if the entity.vel.x > 0
}
_ => /* do something else otherwise */
}
etc.
That's IMO the definition of efficiency: one simple rule, that has no exceptions, works everywhere, and lets you do a lot.
What C does of having multiple different incompatible rules, some for doing simple things like "int a = 42;" and some for doing complex things like "void (foo)(int) = bar" is not simple. It's just a pain. It means that professional C programmers need to look up rules for things they don't use as often, which is why Stackoverflow is full with questions about "How do I assign a function pointer to a variable in C?", "How do I pass a function pointer as a function argument in C?", etc. Having to learn multiple rules to do the same thing just sucks, and is one of the main reasons I love using Rust: when I learn something, I learn it only once, and things I learn later just reinforce that I learned it in the right way.
But that's just not true. There is a single principled way for C declarations, and it's as easy as "a binding is a type name followed by an expression and a semicolon".
And while small inconcistencies have been introduced over time, it was entirely consistent when C was conceived. The problem is just that the simple rule how to read declarations is not well-known (I don't understand why). See my other comments.
> "a binding is a type name followed by an expression and a semicolon".
I find this hard to grock. Do you have a link to the actual grammar and production rules that apply to all cases ?
When I see:
int a = 3;
I don't see a "type name followed by an expression and a semicolon", but rather the grammar "TYPE_NAME NAME = EXPR;". However, that's not correct, since TYPE_NAME cannot be any type (e.g. a function pointer won't work). When looking at
void (*foo)(int) = expr;
int (*foo)[N] = expr;
or at how the keywords "struct" and "union" are part of the type name in some contexts, but not others, I see quite different grammar rules.
I've tried to find literature about this, since I hack on a toy C parser every now and then, and those could simplify it, without much luck.
I've never thought of these as "CDECL = EXPR;" or similar, since that does not work either (e.g. a function declaration would be "void foo(int);" but that's not exactly the same as "void (*foo)(int) = expr;").
> Do you have a link to the actual grammar and production rules that apply to all cases ?
A google search returns that Annex A in the ISO standard has something like a grammar. Here is a link to an unoffical version of the standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf . However, this is not the right place to look for if you just want to understand the simple principle behind declarations, because the purity of declaration syntax has been considerably diluted in the last decades. So, only look in the standard if you are in good psychological health, and need to implement a production-grade C compiler. Also, note that grammars are overrated and tend to make things more complex than they really are. They are often too theoretical of a construct to be applicable, and that is certainly true for a language like C.
> I don't see a "type name followed by an expression and a semicolon", but rather the grammar "TYPE_NAME NAME = EXPR;".
Let's ignore the optional equal sign + initializer expression, and just focus on declarations without initializers. The syntax is (as I said) "TYPE_NAME EXPR;" where EXPR is an expression that makes use of the newly-declared variable.
> (e.g. a function pointer won't work)
Not sure what you mean by "function pointer", but I'm pretty sure it's not a type name in the way I mean it. Here is how to look at your examples:
void (*foo)(int) = expr;
^^^^^^ (optional) initializer
^^^^^^^^^^^ expression (originally it was (*foo)(x),
but as I said nowadays there are
type specifiers in the list which
is a little inconsistent.)
^^^^ type name
int (*foo)[N] = expr;
^^^^^^ (optional) initializer
^^^^^^^^ expression
^^^ type name
Basically, the first example says "(* foo)(int) is a void", so you can conclude that foo is a pointer to a function that takes an int and returns a void.
The second example says "(* foo)[N] is an int" so you can conclude that foo is a pointer to an array of N ints.
To be pedantic, there is no (specialized) function pointer syntax. The syntax to declare function pointers is just general declaration syntax, which in turn is basically regular expression syntax.
How to declare a function pointer is hard to grok when not being introduced to declaring variables in a principled way. But it makes sense and is not too clunky if you're only declaring a function pointer every now and then.
Exactly! C is internally consistent in that, for example, the star in:
int *a;
is part of the variable declaration, not the type. And the function pointer syntax derives from that. (I know you know, just providing context).
However, just because it's consistent doesn't mean it's easy to remember or use. Just repeating "it's easy!" doesn't make it true. Others have brought up that cdecl exists, which illustrates this pretty well.
It seems you've worked with this syntax for long enough, and it fits your way of thinking well enough, that it's not an issue at all for you! But there's ample evidence of a lot of people struggling with it, which should be sufficient to deem it "not easy".
I can understand why newer languages have moved away from this style of declaration syntax, and moving away has brought merits such as becoming more intuitive to understand to novices, as well as better support for tooling / IDEs, including parsing performance gains.
On the other hand, nothing is quite so easy to read and write for me as the terse C declaration syntax (granted I don't declare a lot of pointers-to-functions-returning-pointers-to-functions).
I just made a similar comment before I read yours, but about C# rather than C, where just like C you can do:
`int x = 42;`
But since int (32-bit integer) is the default integer type, you can also do:
`var x = 42;`
If you want to use another type, for example, ulong (unsigned, 64-bit integer), you can do:
`ulong x = 42;`
Or:
`var x = 42ul;`
C#'s syntax is not only terser, but seems a lot easier to read to me. With rust's syntax, it's not immediately clear what the value is, and what the type is.
It's only clear because you already know what "long" and "int" mean in C#. It's potentially quite confusing for someone coming from a C or C++ background as they mean different things there. i32 and u32 on the other hand are ambiguously 32 bits.
In C++ I've actually had code using a value like `42l` that works on Linux but not on Windows, because the sizes of those types aren't fixed.
A valid point about knowing what the types mean, but even if you don't, it is at least immediately apparent which part is the type, and which part is the value.
I've only dabbled with rust, but I came across this very early on, and was baffled by the syntax. After further dabbling, I still can't see it and immediately know what the value is.
Meh, 123_i32 seems like a big improvement to me, but with the others, I just can't immediately grok it - it's having numeric digits as part of the type name that throws me.
I realise of course that not everyone will feel the same.
C99 has effectively identical types in the standard library (uint8_t ... uint64_t, and ditto for int8_t ... uint64_t). There are very few modern C codebases where I haven't seen these used (personally I use them because it's much easier than remembering what is the minimum guaranteed size of unsigned long). The Rust ones have just slightly more terse names (which it's understandable to dislike, though I personally find the endless _t suffixes in C type names to be a bit annoying as well). And Rust's usize is basically uintptr_t.
I'm sorry but this notation will always make me scream. C is so much simple:
int x = 42;
not "let", no colon, and no ambiguous "i32 = 42"