Disclosure: I've got zero C/C++ on my resume. I was asked to diagnose a kernel panic and backport kernel security patches once, but it was very uncomfortable. ("Hey, Terr knows the build system, that's close enough, right?")
That said, perhaps something like disabling the default -fextended-identifiers [0], and enabling the -Wbidi-chars [1] warning.
Not yet. I was working on the standardization for C23, but this was postponed to C26, and has not much support. MSVC and SDCC liked it, CLANG and GC not so much.
It seems like there should be a way to catch these types of “bugs” - some form of dynamic analysis tool that extracts the feature detection code snippets and tries to compile them; if they fail for something like a syntax error, flag it as a broken check.
Expanding macros on different OSes could complicate things though, and determining what flags to build the feature check code with — so perhaps filtering based on the type of error would be best done as part of the build system functionality for doing the feature checking.
I'd prefer if the ecosystem standardized on some dependency management primitives so critical projects aren't expected to invent buggy and insecure hacks ("does this strong parse?") in order to accurately add dependencies.
It would be interesting to see what the most common compile feature checks are for, and see what alternative ways could be used to make the same information available to a build system — it seems like any solution that requires libraries being updated to “export” information on the features they provide would have difficulties getting adoption (and not be backwards compatible with older versions of desired dependencies).
At least for newer C++ standards it seems like there is decent support for feature test macros, which could reduce the need for a feature check involving compiling a snippet of test code to decide if a feature is available: https://en.cppreference.com/w/cpp/feature_test
Handling the output from several of the most recent GCC and Clang versions would probably cover the majority of cases, and add in MSVC for Windows. If the output isn’t from a recognized compiler or doesn’t match expectations, the only option is falling back to current behavior. Not ideal, but better than the current status quo…
That would only work for projects that care only about current compilers, where C in general has more desire to support niche compilers.
A mitigation here would be to make result of autoconf only provide instructions for humans to change their build config, instead of doing it for them silently. The latter is an anti-pattern.
FWIW, the approach you propose is how the UI tests for rustc work, with checks for specific annotations on specific lines, but those have the benefit of being tied to a single implementation/version and modified in tandem with the app. Unless all compilers could be made to provide reasonable machine readable output for errors, doing that for this use case isn't workable.
Well - that would be exactly the point of the attacker. GCC errors out, and it does not matter whether this is because the intentionally typoed header does not exist or because non-English characters are not allowed. Errors encountered during compilation of test programs go to /dev/null anyway, as they are expected not to compile successfully on some systems - that's exactly the point of the test. So no, this option would not have helped.
Probably the code review tools should be hardened as well, to indicate if extended identifiers had been introduced to a line where there wasn't any. That would help catching the replacement of a 'c' character with a Russian one.
Btw, the -fno-extended-identifiers compiler parameter gives an error if UTF-8 identifiers are used in the code:
<source>:3:11: error: stray '\317' in program
float <U+03C9><U+2083> = 0.5f;
> Probably the code review tools should be hardened as well, to indicate if extended identifiers had been introduced to a line where there wasn't any.
Maybe in the future more languages/ tools will have the concept of per-project character sets, as opposed to trying to wrangle all possible Unicode ambiguity problems.
I suppose then the problem is how to permit exceptions when integrating with some library written in another (human) language.
Or we could just accept English as the lingua franca of computing and not try to support anything other than ASCII in source code (at least not outside string constants). That way not only do we eliminate a whole class of possible exploits but also widen the number of people who can understand the code and spot issues.
The -fno-extended-identifiers option seems to do something in this area, but I don't know if it is sufficient. But it may block some characters which are used in standard English (for some values of standard).
> But it may block some characters which are used in standard English
So what? Source code was fine with ASCII for a long time, this push for unicode symbols is a recent endeavor and IMO a huge mistake not just because of the security impliciations.
I mean, GCC erroring is how the exploit works here. cmake tries to compile the source code: if it works then the feature is available, if it fails then the feature is not available. Forcing the build to fail is what the attacker wants.
Another point where I appreciate the Rust compiler:
warning: the usage of Script Group `Cyrillic` in this crate consists solely of mixed script confusables
--> src/lib.rs:1:4
|
1 | fn SYS_landloсk_create_ruleset() {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: the usage includes 'с' (U+0441)
= note: please recheck to make sure their usages are indeed what you want
= note: `#[warn(mixed_script_confusables)]` on by default
The character was in a string, not directly in what was being compiled. The contents of the string failing to compile was the point, as landlock was then disabled.
From what I understand, this landlock disabling wasn't relevant to the sshd attack. It appears it was setting up for something else.
After the concept of such attacks had blow up idk. 1?2?3? years ago a lot of places now warn at least when mixing "languages"/ranges or sometimes on any usage of non us-ascii printable characters.
So stuff like that is increasingly more reliable caught.
Including by 3rd parties doing any ad-hoc scanning.
Through adding `compiles check` fail with syntax errors definitely should be added tot he list of auto scan checks, at least in C/C++ (it's not something you would do in most languages tbh. even in C it seems to be an antipatters).
Sure; but so long as the compiler error is silently discarded (as it is here), the configure script will assume landlock isn't available and never use it.
A misplaced punctuation has some plausible deniability. Like the author could say he was distracted and fat fingered nonsense, honest mistake.
A utf character from a language that has zero chance of being mapped to your programmer's keyboard in the middle of a code line, that would be obvious intentional tampering or at the very least raise some eyebrows.
> A utf character from a language that has zero chance of being mapped to your programmer's keyboard in the middle of a code line, that would be obvious intentional tampering or at the very least raise some eyebrows.
I accidentally get Cyrillic "с" in my code few times a year. It's on the same key as Latin "c", so if I switch keyboard layout, I don't notice until the compiler warns me. Easy to do if I switch between chats and coding. Now my habit is to always type a few characters in addition to "c" to check what layout I'm _really_ using.
Granted, it's easier with a one-letter variable called "c", but with longer names I can easily see myself not typing "c" the first time (e.g. not pressing hard enough), starting build, chatting in some windows, getting back, facepalming, "fixing" the error by adding "c" or "с" depending on my keyboard layout.
even worse, I have Punto switcher that automatically switches language when I start typing. With default config, it changes latin c to russian one because russian language includes word "c" while it's non-sense in English.
and since it's the only Cyrillic character that's placed on the same key as the same-looking english character, I don't even see the problem with my eyes when the autoreplacement fires
I mean, I agree that the punctuation mishap is a better cover story, but why would any particular language have “zero chance” of being mapped to the programmer’s keyboard?
> why would any particular language have “zero chance” of being mapped to the programmer’s keyboard?
You left out a very important qualifier: in the middle of a code line
The chances of someone accidentally fat-fingering a keyboard layout switch, typing one character, then switching back again without noticing while typing a line of code are very slight indeed.
Plus the fact that you'd need to speak a language written in that alphabet to have a plausible reason for having the keyboard layout enabled in the first place.