UB, in this context, is very explicitly used in the standard: it is undefined be...

hvdijk · on May 20, 2021

> behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

That is literally the definition of UB from the C standard. It is explicitly also about constructs that the standard does not describe. That makes sense: the standard does not and cannot define the behaviour for any construct not in the standard, so cannot impose any requirements for such constructs, and that is all UB is: something where the standard imposes no requirements.

tsimionescu · on May 21, 2021

The relevant discussion about UB is restricted to constructa that the standard describes. For example, writing past the end of object is UB - the construct is described in the standard, but is given no semantics by the standard.

The standard does not describe pattern matching, so using pattern matching is also undefined behavior, but there is nothing to be talked about here.

hvdijk · on May 21, 2021

The comment I replied to did talk about something not described by the standard though, namely syscalls. If you want to argue that we should not be talking about syscalls here, your issue should be with the original comment that brought them up (https://news.ycombinator.com/item?id=27222325), not with my reply, I think. However, that comment looks perfectly fine to me. Also, depending on how the syscalls are made, it actually may be explicitly described as UB by the standard, see my comment https://news.ycombinator.com/item?id=27228701 too.

tsimionescu · on May 21, 2021

Syscalls are not any more UB than any other function call, though. Whether talking about write(2) or my_foo(), the call has the semantics given by the function signature visible in the current translation unit. Sure, the C standard doesn't define what write(2)'s effects will be, but that does not mean that calling it is UB according to the standard.

If the function has not been declared by the time it is first used, even then calling it is not UB - it is defined to be a compilation error (in versions earlier than C99 it was actually valid, but UB if the call did not match the actual function definition).

hvdijk · on May 21, 2021

> Sure, the C standard doesn't define what write(2)'s effects will be, but that does not mean that calling it is UB according to the standard.

Yes, it does. I already explained exactly why it needs to be UB, but let me quote where the standard says so:

C99 6.9 External definitions:

> Semantics:

> An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.

If your program provides a declaration of write() and uses it without also providing a definition, the program does not have "exactly one external definition for the identifier", it has zero definitions for the identifier. This violates a "shall" that appears outside of a constraint, for which we turn to:

C99 4 Conformance:

> If a "shall" or "shall not" requirement that appears outside of a constraint is violated, the behavior is undefined.

aw1621107 · on May 21, 2021

> let me quote where the standard says so:

Wouldn't this hinge on what precisely "entire program" means? A definition for write(2) may not appear in the source code you wrote, but if "entire program" includes e.g., libraries dynamically linked in then it's quite feasible for the end result to be fully defined.

For example, 5.2.2 Paragraph 2 starts with (emphasis added):

> In the set of translation units and libraries that constitutes an entire program

hvdijk · on May 21, 2021

Sure, but in the situation we were talking about, the user never wrote a definition for write(), and the user did not specify any library to include that provided a definition of write(). From the standard's perspective, that means there is no definition for it in the entire program.

Keep in mind that the standard's perspective is somewhat different from how things work in practice. We know that on Unix-like systems, there is also the concept of libraries, somewhat different from how the standard describes it, and write() will be provided by the "c" library. But consider the following strictly conforming program:

  #include <stdio.h>
  void write(void) {
    puts("Hello, world!");
  }
  int main(void) {
    write();
  }

A confirming C implementation is not allowed to reject this for a duplicate definition of write(): the name "write" is reserved for use by the programmer, it is not reserved to the implementation. This program must be considered not to violate the "there shall be exactly one external definition for the identifier", so the only way to consider this valid is to say that the implementation does not implicitly provide an external definition of the write() function as far as the C standard is concerned.

Yet at the same time, from the perspective of the implementation, the c library is considered to provide a definition of the write() function, but it is a definition that is only used if the program does not override it with another definition that should be used instead. This concept of multiple definitions for the same name, with rules specifying which of the multiple definitions gets picked, is very useful but is also beyond the scope of the C standard. When we say that a function is defined, we need to be clear on whether we use "define" in the ISO C sense or in some other sense. As your comment shows, things get very confusing if we are not careful with that.

aw1621107 · on May 22, 2021

> and the user did not specify any library to include that provided a definition of write()

Ah. I had assumed that that was implicit in "using write(2)", but seems that was a bad assumption.

> there is also the concept of libraries, somewhat different from how the standard describes it

In what way?

You make an interesting point with the example. It's not something I had considered before. Would weak linkage (or a similar mechanism that allows for a provide-unless-the-user-already-did-so type of behavior) fall under an implementation extension, then?

hvdijk · on May 22, 2021

> In what way?

For the most part the standard does not address the existence of libraries other than the standard library, but 5.1.1.1 contains "Previously translated translation units may be preserved individually or in libraries." This, to me, suggests that from the standard's perspective, when you link in a library, you simply get that library, whereas on Unix systems, when you link in a static library, you specifically get those object files from the library needed to resolve not yet defined references, and when you link in a shared library, you get something where it becomes possible to have duplicate definitions where rules come into play as to which definition will end up used.

> You make an interesting point with the example. It's not something I had considered before. Would weak linkage (or a similar mechanism that allows for a provide-unless-the-user-already-did-so type of behavior) fall under an implementation extension, then?

Yes, I think so. Shared libraries implicitly have some sort of weak linkage already aside from the explicit weak linkage that you can get with e.g. GCC's __attribute__((weak)), but both forms count as extensions, I would say.