More

pornel · 2024-10-08T09:45:15 1728380715

JS and Python are still old enough to have been created when Unicode was in its infancy, so they have their own share of problems from using UCS-2 (such as indexing strings by what is now a UTF-16 code unit, rather than by a codepoint or a grapheme cluster).

Swift has been developed in the modern times, and it's able to tackle Unicode properly, e.g. makes distinction between codepoints and grapheme clusters, and steers users away from random-access indexing and having a single (incorrect) notion of a string length.

pornel · 2024-10-08T09:33:33 1728380013

Being developed in, and having to stay compatible with, ancient times is a real problem of C++.

The now-invalid assumptions couldn't have been avoided 50 years ago. Fixing them now in C++ is difficult or impossible, but still, the end result is a ton of brokenness baked into C++.

Languages developed in the 21st century typically have some at least half-decent Unicode support built-in. Unicode is big and complex, but there's a lot that a language can do to at least not silently destroy the encoding.

cm2187 · 2024-10-08T10:18:58 1728382738

That explains why there are two functions, one for ascii and one for unicode. That doesn't explain why the unicode functions are hard to use (per the article).

BoringTimesGang · 2024-10-08T10:55:24 1728384924

Because human language is hard to boil down to a simple computing model and the problem is underdefined, based on naive assumptions.

Or perhaps I should say naïve.

cm2187 · 2024-10-08T18:11:15 1728411075

Well pretty much every other more recent language solved that problem.

kccqzy · 2024-10-08T18:24:31 1728411871

Almost no programming language, perhaps other than Swift, solved that problem. Just use the article's examples as test cases. It's just as wrong as the C++ version in the article, except it's wrong with nicer syntax.

zahlman · 2024-10-08T18:46:01 1728413161

Python's strings have uppercase, lowercase and case-folding methods that don't choke on this. They don't use UTF-16 internally (they can use UCS-2 for strings whose code points will fit in that range; while a string might store code points from the surrogate-pair range, they're never interpreted as surrogate pairs, but instead as an error encoding so that e.g. invalid UTF-8 can be round-tripped) so they're never worried about surrogate pairs, and it knows a few things about localized text casing:

    >>> 'ß'.upper()
    'SS'
    >>> 'ß'.lower()
    'ß'
    >>> 'ß'.casefold()
    'ss'

There are a lot of really complicated tasks for Unicode strings. String casing isn't really one of them.

(No, Python can't turn 'SS' back into 'ß'. But doing that requires metadata about language that a string simply doesn't represent.)

crote · 2024-10-09T08:07:53 1728461273

But that's wrong. The uppercase for "in Maßen" ("in moderate amounts") is not "IN MASSEN" ("in Massen", meaning "in massive amounts").

kccqzy · 2024-10-08T19:04:40 1728414280

Still breaks on, for example, Turkish i vs İ. It's impossible to do correctly without language information.

> (No, Python can't turn 'SS' back into 'ß'. But doing that requires metadata about language that a string simply doesn't represent.)

Yes that's my point. Because in typical languages strings don't store language metadata, this is impossible to do correctly in general.

zahlman · 2024-10-08T19:08:57 1728414537

I'm not seeing anything in the Swift documentation about strings carrying language metadata, either, though?

kccqzy · 2024-10-08T19:20:54 1728415254

This lowercase function takes a locale argument https://developer.apple.com/documentation/foundation/nsstrin...

It looks like an old NSString method that's available in both Obj-C and Swift.

The casefold function is even older than that. https://developer.apple.com/documentation/foundation/nsstrin... Its documentation specifically includes a discussion of the Turkish İ/I issue.

tedunangst · 2024-10-08T19:18:53 1728415133

But that's wrong. The upper case for ß is ẞ.

cm2187 · 2024-10-08T20:27:23 1728419243

C#'s "ToUpper" takes an optional CultureInfo argument if you want to play around with how to treat different languages. Again, solved problem decades ago.

account42 · 2024-10-09T13:14:59 1728479699

This is not a locale issue, it's a Unicode version issue. Which hightlights another problem with adding this to the base standard library.

IncreasePosts · 2024-10-08T20:15:24 1728418524

That was only adopted in Germany like 7 years ago!

kccqzy · 2024-10-08T21:10:48 1728421848

Well languages and conventions change. The € sign was added not that long ago and it was somewhat painful. The Chinese language uses a single character to refer to chemical elements so when IUPAC names new elements they will invent new characters. Etc.

extraduder_ire · 2024-10-09T01:10:28 1728436228

Does unicode have space set aside for those new symbols to slot into? I know it's very rare, but it could get messy.

account42 · 2024-10-09T13:16:52 1728479812

Unicode is already messy. Chinese characters especially so due to han unificiation.

Towaway69 · 2024-10-09T06:29:13 1728455353

Isn't uppercase for ß just ß - i.e. it's its own uppercase character?

bratwurst3000 · 2024-10-09T12:59:18 1728478758

there shouldn’t be an uppercase version of ß because there is no word in the german language that uses it as the first letter. the german language didnt think of allcaps. please correct me if I am wrong. If written in uppercase it should be converted to SZ or the new uppercase ß…. which my iphone doesn’t have… and converting anything to uppercase SS isn’t something germany wants …

account42 · 2024-10-09T13:22:52 1728480172

> there shouldn’t be an uppercase version of ß because there is no word in the german language that uses it as the first letter. the german language didnt think of allcaps.

Allcaps (and smallcaps) has always existed in signage everywhere. Before the computing age, letters where just arbitrary metal stamps -- and just whatever you could draw before that. Historically, language was not as standardized as it is today.

Towaway69 · 2024-10-09T21:47:10 1728510430

I don’t think that Germany wants a capital ß or the German language requires one rather technology needs one to dot the eyes and cross the tees.

account42 · 2024-10-09T13:18:42 1728479922

Not generally no, but some applications used it that way because of ambiguity of upppercasing ß to SS - which is why ẞ was added.

Towaway69 · 2024-10-09T21:43:30 1728510210

On the other hand, the German language has existed for several hundred years without having a capital ß but now it needs one?

True capitalisation has always existed but even that didn’t seem to have required a capital ß - why now?

tialaramex · 2024-10-08T22:00:38 1728424838

Rust will cheerfully:

    assert_eq!("ὀδυσσεύς", "ὈΔΥΣΣΕΎΣ".to_lowercase());

[Notice that this is in fact entirely impossible with the naive strategy since Greek cares about position of symbols]

Some of the latter examples aren't cases where a programming language or library should just "do the right thing" but cases of ambiguity where you need locale information to decide what's appropriate, which isn't "just as wrong as the C++ version" it's a whole other problem. It isn't wrong to capitalise A-acute as a capital A-acute, it's just not always appropriate depending on the locale.

account42 · 2024-10-09T13:27:33 1728480453

Is this

    assert_eq!("\u1F41δυσσεύς", "ὈΔΥΣΣΕΎΣ".to_lowercase());

or

    assert_eq!("\u03BF\u0314δυσσεύς", "ὈΔΥΣΣΕΎΣ".to_lowercase());

For display it doesn't matter but most other applications really want some kind of normalizatin which does much much more so having a convenient to_lowercase() doesn't buy you as much as you think and can be actively misleading.

MBCook · 2024-10-08T22:36:17 1728426977

So what?

That doesn’t prevent adding a new function that converts an entire string to upper or lowercase in a Unicode aware way.

What would be wrong with adding new correct functions to the standard library to make this easy? There are already namespaces in C++ so you don’t even have to worry about collisions.

That’s the problem I see. It’s fine if you have a history of stuff that’s not that great in hindsight. But what’s wrong with having a better standard library going forward?

It’s not like this is an esoteric thing.

wakawaka28 · 2024-10-09T00:24:20 1728433460

The reason that wasn't done is because Unicode is not really in older C++ standards. I think it may have been added to C++23 but I am not familiar with that. There are many partial solutions in older C++ but if you want to do it well then you need to get a library for it from somewhere, or else (possibly) wait for a new standard.

Unicode and character encodings are pretty esoteric. So are fonts. The stuff is technically everywhere and fundamental, but there are many encodings, technical details, etc. And most programmers only care about one language, or else only use UTF-8 with the most basic chars (the ones that agree with ASCII). That isn't terrible. You only need what you actually need. Most programs don't strictly have to be built for multiple random languages, and there is kind of a standard methodology to learn before you can do that.

account42 · 2024-10-09T13:11:59 1728479519

No, strong backwards compatiblity a real strength of C++. In fact, it's probably it's main strength these days.

pornel · 2024-10-07T17:27:14 1728322034

I hope you're calling the `backup` command in sqlite. A simple copy can leave sqlite db files in an inconsistent state from which sqlite can't recover.

lukaesch · 2024-10-07T20:09:06 1728331746

Thanks for the hint. I didn't know that. So far nothing happened, but I will start using the backup command.

reliabledang · 2024-10-11T00:10:10 1728605410

"reliable software"

pornel · 2024-10-04T21:44:57 1728078297

They probably plan for the game to be on sale for a long time, so they don't want the game to look dated too soon, and aimed for future hardware. The specs won't seem so high in a few years.

Games are long overdue to use the full CPU instead of bottlenecking of single-core performance. I hope they've actually designed for multi-core CPUs, and made as many things data-parallel as possible.

pornel · on Sept 27, 2024

This is already the reality in most of EU and the UK. There are 300kW charging stations all over the place, and a decent EV can recharge in 20 minutes.

I actually prefer this over refuelling, because filling up takes 5+ minutes of my time, while recharging is unattended, so I can plug in in seconds, and go get a coffee/break in the meantime. Chargers also tend to be next to nicer places, near food courts and other retail, which is handy, and doesn't need to be a separate stop on road trips.

pornel · on Sept 26, 2024

Rust can't prevent crates from doing anything. It's not a sandbox language, and can't be made into one without losing its systems programming power and compatibility with C/C++ way of working.

There are countless obscure holes in rustc, LLVM, and linkers, because they were never meant to be a security barrier against the code they compile. This doesn't affect normal programs, because the exploits are impossible to write by accident, but they are possible to write on purpose.

---

Secondly, it's not 1000 crates from 1000 people. Rust projects tend to split themselves into dozens of micro packages. It's almost like splitting code across multiple .c files, except they're visible in Cargo. Many packages are from a few prolific authors and rust-lang members.

The risk is there, but it's not as outsized as it seems.

Maintainers of your distro do not review code they pull in for security, and the libraries you link to have their own transitive dependencies from hundreds of people, but you usually just don't see them: https://wiki.alopex.li/LetsBeRealAboutDependencies

Rust has cargo-vet and cargo-crev for vetting of dependencies. It's actually much easier to review code of small single-purpose packages.

vlovich123 · on Sept 26, 2024

There’s two different attack surfaces - compile time and runtime.

For compile time, there’s a big difference between needing the attacker to exploit the compiler vs literally just use the standard API (both in terms of difficulty of implementation and ease of spotting what should look like fairly weird code). And there’s a big difference between runtime rust vs compile time rust - there’s no reason that cargo can’t sandbox build.rs execution (not what josephg brought up but honestly my bigger concern).

There is a legitimate risk of runtime supply chain attacks and I don’t see why you wouldn’t want to have facilities within Rust to help you force contractually what code is and isn’t able to do when you invoke it as a way to enforce a top-level audit. Even though rust today doesn’t support it doesn’t make it a bad idea or one that can’t be elegantly integrated into today’s rust.

pornel · on Sept 26, 2024

I agree there's a value in forcing exploits to be weirder and more complex, since that helps spotting them in code reviews.

But beyond that, if you don't review the code, then the rest matters very little. Sandboxed build.rs can still inject code that will escape as soon as you test your code (I don't believe people are diligent enough to always strictly isolate these environments despite the inconvenience). It can attack the linker, and people don't even file CVEs for linkers, because they're expected to get only trusted inputs.

Static access permissions per dependency are generally insufficient, because an untrusted dependency is very likely to find some gadget to use by combining trusted deps, e.g. use trusted serde to deserialize some other trusted type that will do I/O, and such indirection is very hard to stop without having fully capability-based sandbox. But in Rust there's no VM to mediate access between modules or the OS, and isolation purely at the source code level is evidently impossible to get right given the complexity of the type system, and LLVM's love for undefined behavior. The soundness holes are documented all over rustc and LLVM bug trackers, including some WONTFIXes. LLVM cares about performance and compatibility first, including concerns of non-Rust languages. "Just don't write weirdly broken code that insists on hitting a paradox in the optimizer" is a valid answer for LLVM where it was never designed to be a security barrier against code that is both untrusted and expected to have maximum performance and direct low-level hardware access at the same time.

And that's just for sandbox escapes. Malware in deps can do damage in the program without crossing any barriers. Anything auth-adjacent can let an attacker in. Parsers and serializers can manipulate data. Any data structure or string library could inject malicious data that will cross the boundaries and e.g. alter file paths or cause XSS.

josephg · on Sept 26, 2024

> the exploits are impossible to write by accident, but they are possible to write on purpose.

Can you give some examples? What ways are there to write safe rust code & do nasty things, affecting other parts of the binary?

Is there any reason bugs like this in LLVM / rustc couldn't be, simply, fixed as they're found?

steveklabnik · on Sept 26, 2024

https://github.com/Speykious/cve-rs

They can be fixed, but as always, there’s a lot of work to do. The bug that the above package relies on has never been seen in the wild, only from handcrafted code to invoke it, and so is less of a priority than other things.

And some fixes are harder than others. If a fix is going to be a lot of work, but is very obscure, it’s likely to exist for a long time.

josephg · on Sept 28, 2024

Yes, true. But as others have said, there’s probably still some value in making authors of malicious code jump through hoops, even if it will take some time to fix all these bugs.

And the bugs should simply get fixed.

pornel · on Sept 26, 2024

People have a finite amount of time and effort they can spend on making the code correct. When the language is full of traps, spec gotchas, antiquated misfeatures, gratuitous platform differences, fragmented build systems, then a lot of effort is wasted just on managing all of that nonsense that is actively working against writing robust code, and it takes away from the effort to make a quality product beyond just the language-wrangling.

You can't rely on people being perfect all the time. We've been trying that for 50 years, and only got an endless circle of CVEs and calls to find better programmers next time.

The difference is how the language reacts to the mistakes that will happen. It could react with "oops, you've made a mistake! Here, fix this", and let the programmer apply a fix and move on, shipping code without the bug. Or the language could silently amplify smallest mistakes in the least interesting code into corruption that causes catastrophic security failures.

When concatenating strings and adding numbers securely is a thing that exists, and a thing that requires top-skilled programmers, you're just wasting people's talent on dumb things.

pornel · on Sept 25, 2024

About invariants generally: Rust wants to know when memory behind each pointer is immutable, or mutable by Rust only, or could be mutated by something else while Rust has a pointer to it. Rust also encodes which types can't be moved to another thread, and which pointers own their memory and need to be freed by Rust.

These are part of the type system, so they need to be defined precisely. The answer to these questions can't be just "it depends". If there are run-time or config-dependent conditions when the same data is owned or not or immutable or not, Rust has enums, unions and unsafe escape hatches to guard access to it.

pornel · on Sept 25, 2024

It's possible to get slightly better compression by losslessly rearranging data in a JPEG (DC components are compressed row by row, and prefer fewer horizontal color changes).

However, here the author seems to accidentally fully recompress the images, and falls into the classic trap of "looks almost the same but the file is much smaller!"

That's what every lossy format does every time. They're designed to lose data that is hardest to see with the naked eye.

At the top end of the quality range, file sizes grow exponentially in proportion to quality (it's natural consequence of allowing less and less data to be lost, and approaching lossless compression that has a hard limit on how effective it can be).

But conversely this means that going from very high quality to still-pretty-high quality moves the file size along the exponential curve and seems to give dramatic reduction in file size. This isn't any trick, that's how it works.

File size change is easy to measure, but visual quality change is difficult to quantify, so people disregard the visual change. In reality they're just moving a point along a curve, and recompression gets a worse curve (less quality for the same file size) than the quality-to-file-size ratio of compressing the file at the lower quality from the start.

vanderZwan · on Sept 25, 2024

> However, here the author seems to accidentally fully recompress the images, and falls into the classic trap of "looks almost the same but the file is much smaller!"

Except they didn't quite do that: yes, they recompressed the image instead of using the lossles rotation that JPEG is capable of. However, they then compared a recompressed rotated image to a recompressed image that wasn't rotated, and noted there was still a significant size differnce.

He also claims to have verified in GIMP that the two recompressed files were visually identical after rotating (I'm a little suspicious of that bit, since you wouldn't notice a tiny difference unless you use the "difference" layer mode and theo manually amplify the miniscule differences with something like the curves tool)

sotix · on Sept 25, 2024

Yes this is the crux and the fun of my discovery! I was surprised how using sips to rotate the image resulted in a lower size than using sips or ImageMagick to directly compress the image.

I’d encourage you to download the image from the link in the article and try yourself if you have a Mac and then compare them with GIMP because it’s very possible I didn’t do a perfect job with that.

lifthrasiir · on Sept 25, 2024

See my other comment for the result from ImageMagick, which shows little difference regardless of the orientation. For sips, there is a possibility that chroma subsampling impacted the result (because there are two different scaling factors for each axis) and you are technically comparing different images.

pornel · on Sept 21, 2024

And every CVSS score is 9.8, because it's designed to never underestimate potential risk, no matter how absurdly unlikely, rather than be realistic about the actual risk.

clwg · on Sept 21, 2024

CVSS is not not really meant to measure risk, it primarily measures the severity of technical vulnerabilities. It should be used in conjunction with other factors such as system exposure and threat sources to determine the probability of exploitation. This should then be combined with impact and costing data to fully assess the risk.

Regulatory requirements also need to be contextualized similarly. If they become burdensome, efforts should focus on reducing the exposure of your systems to those risks.

That said, patch and configuration management should be second nature and performed continuously so that when a real issue arises, you're prepared and not worried about your environment falling over because you're unsure how it will respond to an update, or whether your backups will restore properly - which are risks as well.

I saw more than a few organizations struggle with log4j because they only patched server systems when a vulnerability was publicly exposed, and a Metasploit exploit was available.