Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To claim that these letters "cannot be represented" is just outright bizarre. You literally you did so yourself. Expecting Unicode to contain a codepoint for every single rendering variation is not realistic and the line must be drawn somewhere, with other rendering information provided in another way (e.g. lang=de, font-style, whatnot).

You can disagree how Unicode does this (or how other encodings do it, for that matter) but this is just an utterly disingenuous thing to say. I no longer believe you are engaging in good faith. You have either not understood Unicode or you're intentionally misrepresenting it. Good bye.



"To claim that these letters "cannot be represented" is just outright bizarre. You literally you did so yourself."

I did not. In every book printed before 1950 and every quality book printed now the different characters would actually look differently. This is not about rendering variations but about different characters (linguistically and functionally, e.g. wrt collation) that coincidentally look similar and Unicode confuses.

Here is a source from DIN (Deutsches Institut für Normung) with more background:

https://www.unicode.org/L2/L2003/03215-n2593-umlaut-trema.pd...

If you think its just crazy Germans arguing a moot point Yannis Haralambous has a paragraph specifically about the umlaut/trema issue in his O'Reilly book "Fonts & Encodings".


Haven't read the book yet, but isn't that more like a matter of the font/rendering engine? I have a murky notion that for Cyrillic, for example, there are some nuances in rendering certain glyphs in cursive between languages [1], but these nuances are usually resolved by cooperation of the font and client interpreting the language hints, so not in the "physical" text.

(Not telling I see this as a good thing or anything: it is way beyond my expertise; I definitely can see the motivation for introducing as many variants in the Unicode register as there are in the real world)

Isn't the umlaut vs trema/diaeresis in a similar situation?

[1] made me test it and cobble a demo. (Sadly, not speaking any of these languages, so cannot verify it is correct; just wanted to see the difference in practice):

    data:text/html;charset=utf-8;verbatim,<style>
    @import url("https://fonts.googleapis.com/css2?family=Noto+Sans:ital@0;1");
    body { font-family: 'Noto Sans'; }
    dl:hover i { font-style: normal; }
    </style>
    <dl>
    <dt>lang="ru"
    <dd lang="ru"><i>грипп, практика, график, типа</i>
    <dt>lang="sr"
    <dd lang="sr"><i>грипп, практика, график, типа</i>
    </dl>
Arguably, depending on wide (physical text ↔ specific font ↔ rendering agent) ecosystem feels quite fragile, but cannot tell if there is any better alternative for this particular case.

https://myfonj.github.io/sandbox.html#%3C!doctype%20html%3E%...


>Expecting Unicode to contain a codepoint for every single rendering variation

It's not just rendering variations. While they are etymlogically related they are made with different strokes and are incorrect to substitute for one another.

Technically Unicode has a variant selector that can be used for selecting between variations of the characters, but this does not have sufficient adoption. So that means pretty much everything has to annotate what language it is written in so it can be rendered correctly, else the system has to check the system settings to guess what language the user likely wants to see things rendered as.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: