Disclosure: I've got zero C/C++ on my resume. I was asked to diagnose a kernel panic and backport kernel security patches once, but it was very uncomfortable. ("Hey, Terr knows the build system, that's close enough, right?")
That said, perhaps something like disabling the default -fextended-identifiers [0], and enabling the -Wbidi-chars [1] warning.
Not yet. I was working on the standardization for C23, but this was postponed to C26, and has not much support. MSVC and SDCC liked it, CLANG and GC not so much.
It seems like there should be a way to catch these types of “bugs” - some form of dynamic analysis tool that extracts the feature detection code snippets and tries to compile them; if they fail for something like a syntax error, flag it as a broken check.
Expanding macros on different OSes could complicate things though, and determining what flags to build the feature check code with — so perhaps filtering based on the type of error would be best done as part of the build system functionality for doing the feature checking.
I'd prefer if the ecosystem standardized on some dependency management primitives so critical projects aren't expected to invent buggy and insecure hacks ("does this strong parse?") in order to accurately add dependencies.
It would be interesting to see what the most common compile feature checks are for, and see what alternative ways could be used to make the same information available to a build system — it seems like any solution that requires libraries being updated to “export” information on the features they provide would have difficulties getting adoption (and not be backwards compatible with older versions of desired dependencies).
At least for newer C++ standards it seems like there is decent support for feature test macros, which could reduce the need for a feature check involving compiling a snippet of test code to decide if a feature is available: https://en.cppreference.com/w/cpp/feature_test
Handling the output from several of the most recent GCC and Clang versions would probably cover the majority of cases, and add in MSVC for Windows. If the output isn’t from a recognized compiler or doesn’t match expectations, the only option is falling back to current behavior. Not ideal, but better than the current status quo…
That would only work for projects that care only about current compilers, where C in general has more desire to support niche compilers.
A mitigation here would be to make result of autoconf only provide instructions for humans to change their build config, instead of doing it for them silently. The latter is an anti-pattern.
FWIW, the approach you propose is how the UI tests for rustc work, with checks for specific annotations on specific lines, but those have the benefit of being tied to a single implementation/version and modified in tandem with the app. Unless all compilers could be made to provide reasonable machine readable output for errors, doing that for this use case isn't workable.
Well - that would be exactly the point of the attacker. GCC errors out, and it does not matter whether this is because the intentionally typoed header does not exist or because non-English characters are not allowed. Errors encountered during compilation of test programs go to /dev/null anyway, as they are expected not to compile successfully on some systems - that's exactly the point of the test. So no, this option would not have helped.
Probably the code review tools should be hardened as well, to indicate if extended identifiers had been introduced to a line where there wasn't any. That would help catching the replacement of a 'c' character with a Russian one.
Btw, the -fno-extended-identifiers compiler parameter gives an error if UTF-8 identifiers are used in the code:
<source>:3:11: error: stray '\317' in program
float <U+03C9><U+2083> = 0.5f;
> Probably the code review tools should be hardened as well, to indicate if extended identifiers had been introduced to a line where there wasn't any.
Maybe in the future more languages/ tools will have the concept of per-project character sets, as opposed to trying to wrangle all possible Unicode ambiguity problems.
I suppose then the problem is how to permit exceptions when integrating with some library written in another (human) language.
Or we could just accept English as the lingua franca of computing and not try to support anything other than ASCII in source code (at least not outside string constants). That way not only do we eliminate a whole class of possible exploits but also widen the number of people who can understand the code and spot issues.
The -fno-extended-identifiers option seems to do something in this area, but I don't know if it is sufficient. But it may block some characters which are used in standard English (for some values of standard).
> But it may block some characters which are used in standard English
So what? Source code was fine with ASCII for a long time, this push for unicode symbols is a recent endeavor and IMO a huge mistake not just because of the security impliciations.
I mean, GCC erroring is how the exploit works here. cmake tries to compile the source code: if it works then the feature is available, if it fails then the feature is not available. Forcing the build to fail is what the attacker wants.