More

dhosek · 2025-11-09T03:34:00 1762659240

The first thing I thought when I saw the article was, I wonder if they have passenger service.

namibj · 2025-11-09T10:53:28 1762685608

Many freighters have space for the 12 passengers they're allowed to carry before they get reclassified to passenger ship with all the post-Titanic SOLAS requirements those incur.

The first 12 however are easy, they self-select to be low-maintenance, they just get their breakfast/lunch/dinner portion from the ship's cook, and have their cabin. A little electricity, reasonable running water, and they're contend. Essentially a hotel room plus partaking the staff dinner/in-House-catering.

Most people just can't handle the boredom these days, and many don't have that many days to spare, either.

PaulDavisThe1st · 2025-11-09T16:47:13 1762706833

My brother did this (Southampton -> Baltimore). It's a lot slower than the Queen Mary. He told me when he arrived "You don't know what time is until you've already watched all the movies you bought with you, read all the books, done all the work, and you realize you've still got 5 days left".

And he's probably the most able-to-handle-boredom human being that I've ever known.

jonah · 2025-11-09T16:08:59 1762704539

I have a friend who is a writer and he has taken passage on freighters a few times. He really liked it for getting work done.

potato3732842 · 2025-11-09T15:14:23 1762701263

The passenger capacity is mostly expected to be used by people related to the shipping industry, basically repositioning people the same way airlines do.

Like you'll have two guys being repositioned and another guy will be there with a clipboard and tasked with something.

sillyfluke · 2025-11-09T06:00:16 1762668016

Due to longterm conditioning from reading tech article headlines and discussions, my brain now autocompletes the word "cargo" to "cargo cult" every time it sees it in a tech context.

So I read "Largest cargo cult sailboat completes first Atlantic crossing" and was immediately intrigued...

dhosek · 2025-11-06T00:42:24 1762389744

Grapheme count is not a useful number. Even in a monospaced font, you’ll find that the grapheme count doesn’t give you a measurement of width since emoji will usually not be the same width as other characters.

paulddraper · 2025-11-06T01:43:35 1762393415

Grapheme count (or rather, indexing) is necessary to do text selection or cursor positions.

Fortunately you can usually outsource this to a UI toolkit which can do it.

WorldMaker · 2025-11-06T21:41:47 1762465307

Most of the UI toolkits defer to font-based layout calculation engines rather than grapheme counts. Grapheme counts are a handy approximation in many cases, but you aren't going to get truly accurate text selection or cursor positions without real geometry from a font and whatever layout engine is closest to it.

(Fonts may disagree on supported ligatures, for instance, or not support an emoji grapheme cluster and fall back to displaying multiple component emoji, or layout multiple graphemes in two dimensions for a given language [CJK, Arabic, even Math symbols, even before factoring in if a font layout engine supports the optional but cool UnicodeMath [0]] or any number of other tweaks and distinctions between encoding a block of text and displaying a block of text.)

[0] https://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v3.1....

dhosek · 2025-11-08T02:08:41 1762567721

Ligatures are a whole other thing independent of graphemes. Grapheme clustering is based entirely independently of the font and a sequence like U+1F1E6 U+1F1E6 is considered a single grapheme according to the Unicode specification even though no font will display this as a single character (it will instead be rendered as something resembling two A’s in boxes in most fonts).

WorldMaker · 2025-11-08T03:55:03 1762574103

I included ligatures as a logically separate concept that needs to be accounted for when counting the "displayed width" of a string and how you handle cursor position and selection logic.

However, ligatures are a part of grapheme clustering and it isn't entirely independent: Examples above and elsewhere include ligatures that have Unicode encodings (ex: ﬁ). Ligatures have been a part of character encodings and have affected grapheme clustering since EBCDIC (directly from which Unicode inherits a lot of its single codepoint Latin ligatures). There are way too many debates if normal forms should encode ligatures or always decompose them or some other strategy. The normal forms (NFC and NFD, NFKC and NFKD) themselves are embedded as a step of several of the standard grapheme clustering algorithms.

Some people think ligatures should be entirely left to fonts and Unicode ligatures a relic of the IBM bitmap font past. Some fonts think it would be nice if some ligatures were more directly encoded and normal forms could do a first pass for them. Unicode has had portions of the specification on both sides of that argument. It generally leans away from the backward compatibility ligatures and generally normalizes them to decomposition, especially in locales like "en-us" these days, but it doesn't always and in every locale. (All of that is before you start to consider languages that are almost nothing but long sequences of ligatures, including but not limited to Arabic.)

You can't do grapheme clustering and entirely ignore ligatures. You certainly can't count "display width" or "display position" without paying attention to ligatures, which was the point of bringing them up alongside mentions of grapheme clustering length and why it is insufficient for "display position" and "display width".

dhosek · 2025-11-09T04:07:23 1762661243

Umm, you’re confusing things here. When you said “several of the standard grapheme clustering algorithms” that indicated the first element of confusion. There is one grapheme clustering algorithm in Unicode.¹ Ligature combinations, whether encoded in Unicode or not are independent of the concept of graphemes and are very much font-dependent, as is whether or not a grapheme cluster is displayed as multiple characters or a single character (any two consecutive regional indicators are considered a single grapheme, whether they map to a national flag or not and the set of national flags and whether those flags are even displayed or the output is just the regional indicator symbol is up to the font). Normalization does not play a role in grapheme clustering at all, which is why á and a+acute accent are both considered single graphemes.

I’m puzzled about the assertion of EBCDIC having Latin ligatures because I never encountered them in my IBM mainframe days and a casual search didn’t turn any up. The only 8-bit encoding that I was aware of that included any ligatures was Mac extended ASCII which included fi and fl (well, putting aside TeX’s 7-bit encoding which was only used by TeX and because of its using x20 to hold the acute accent was unusable by other applications which expected a blank character in that spot).

The question about dealing with ligatures for non-Latin scripts generally came down to compatibility with existing systems more than anything else. This is why, for example, the Devanagari and Thai encodings which both have vowel markings which can occur before, after or around a consonant handle the sequence of input differently. Assembly of jambo into characters in Hangul is another case where theoretically, one could have all the displayed characters handled through ligatures and not encode the syllables directly (which is, in fact, how most modern Korean is encoded as evidenced by Korean wikipedia), but because the original Korean font encoding has all the syllables encoded in it² those syllables are part of Unicode as well.

But the bottom line here, is that you seem to be confusing terminology a lot here. You can very much do grapheme clustering without paying attention to ligatures, and rules for things like normalized composition/decomposition are entirely independent of grapheme clustering (the K-forms of both, manage transitions like ¹ to 1 or ﬁ to fi and represent a one way transition).

⸻

1. I wrote a Rust library implementing this and I’ve been following Unicode since it was a proposal from Microsoft and Apple in the 90s, so I know a little about this.

2. I think the more accurate term would be “most” of the syllables as I’ve seen additional syllables added to Unicode over time.

WorldMaker · 2025-11-09T18:58:42 1762714722

My point has been grapheme clustering doesn't matter as a useful length on its own, especially not for display length. The technical specifics/technical corrections are getting into weeds that are orthogonal to point of the discussion in the first place.

> I’m puzzled about the assertion of EBCDIC having Latin ligatures because I never encountered them in my IBM mainframe days and a casual search didn’t turn any up. The only 8-bit encoding that I was aware of that included any ligatures was Mac extended ASCII which included fi and fl (well, putting aside TeX’s 7-bit encoding which was only used by TeX and because of its using x20 to hold the acute accent was unusable by other applications which expected a blank character in that spot).

EBCDIC had multiple "code pages" to handle various international encodings. (DOS and Windows inherited the "code page" concept from IBM mainframes.) Of the code pages EBCDIC supported in the mainframe era, many were "Publishing" code pages intended for "pretty printing" text to printers that supported presumably and primarily bitmap fonts. A low ID example of such is IBM Code Page (CCSID) 361: https://web.archive.org/web/20130121103957/http://www-03.ibm...

You can see that code page includes fi, fl, ff, ffi, ij, among others even less common in English text.

Most but not all of the IBM Code Pages were a part of the earliest Unicode standards encoding efforts.

Spivak · 2025-11-06T01:47:33 1762393653

For certain use-cases, but it's not like any of the other usual notions of text length are any better for what you want.

lmm · 2025-11-06T04:41:06 1762404066

If all possible notions of length are footguns, maybe there should be no default "length" operation available.

WorldMaker · 2025-11-06T21:46:56 1762465616

Indeed. Several languages have debated dropping or already dropped an easy to access "Length" count on strings and making it much more explicit if you want "UTF-8 Encoded Length" or "Codepoint count" or "Grapheme count" or "Grapheme cluster count" or "Laid out font width".

Why endorse a bad winner when you can make more of the trade-offs more obvious and give programmers a better chance of asking for the right information instead of using the wrong information because it is the default and assuming it is correct?

dhosek · 2025-10-31T14:35:31 1761921331

There’s a fun scene in The Last Action Hero which plays on this convention: A kid who’s gotten transported into a movie and is trying to convince Arnold Scharzenegger’s character that they’re in a movie by asking a bunch of people what their phone number is and pointing out that all the numbers begin with 555.

dhosek · 2025-10-31T14:34:02 1761921242

I think that’s at least in part because some 555 prefix numbers have been assigned for non-directory information uses (I have a vague notion of seeing this for some toll-free numbers).

kid64 · 2025-11-05T19:10:48 1762369848

Historically they connected to a periodic time-of-day announcement. So assigned in a way that didn't preclude TV/movie use.

dhosek · 2025-10-30T20:23:44 1761855824

In the 90s I did a tech writing gig documenting some custom software a company had built for them by one of the big consultancy agencies. It was a bit of a nightmare as the functionality was arranged in a way that reflected the underlying architecture of the program rather than the users’ workflows. Although I suppose if they’d written the software well, I wouldn’t have had as many billable hours writing documentation.

sublinear · 2025-10-30T22:40:29 1761864029

> reflected the underlying architecture of the program rather than the users’ workflows

Is this an inherently bad thing if the software architecture is closely aligned with the problem it solves?

Maybe it's the architecture that was bad. Of course there are implementation details the user shouldn't care about and it's only sane to hide those. I'm curious how/why a user workflow would not be obviously composed of architectural features to even a casual user. Is it that the user interface was too granular or something else?

I find that just naming things according to the behavior a layperson would expect can make all the difference. I say all this because it's equally confusing when the developer hides way too much. Those developers seem to lack experience outside their own domain and overcomplicate what could have just been named better.

DrewADesign · 2025-10-31T02:22:49 1761877369

Developers often don’t think it’s a bad thing because that’s how we think about software. Regular users think about applications as tools to solve a problem. Being confronted by implementation details is no problem for people with the base knowledge to understand why things are like that, but without that knowledge, it’s just a confusing mess.

StrauXX · 2025-10-30T23:01:02 1761865262

If you ever spens time with the low level SAP GUIs, then yes, you will find out why that's definetly a bad thing. Software should reflect users processes. The code below is just an implementation detail and should never impact the design of the interfaces.

dhosek · 2025-10-30T03:02:40 1761793360

I remember at the time there was also going to be the wonderful new kernel that would allow OS/2 and MacOS to coexist on the same machine. As someone who had a Mac and an OS/2 machine side-by-side on his desk, this seemed like it could be a wonderful thing, but alas, it was never to come to be.

linguae · 2025-10-30T04:34:57 1761798897

I was just a kid during the 1990s when all of this was happening, but a few years ago I remember reading about an IBM project named GUTS where one kernel would run multiple OS "personalities":

https://en.wikipedia.org/wiki/Workplace_OS

The 1990s were quite a time for personal and workstation computing.

LeFantome · 2025-10-30T06:16:54 1761805014

This was the same design goal that Windows NT had. In fact, it launched with Win32 (Windows), OS/2, and POSIX (UNIX).

I think the OS/2 subsystem was 16-bit OS/2 1.x so nobody cared and the POSIX subsystem was just compliant enough to win government contracts.

This design is why we have the "Windows Subsystem for Linux" (a name everybody hates) because "Windows Subsystems" were already a thing in Windows.

Docker, Distrobox, and even Flatpak are one kernel with multiple "personalities" but they are all still Linux I guess.

You can also argue have this on our desktops today with things like KVM in Linux and Hyper-V in Windows.

pjmlp · 2025-10-30T14:39:54 1761835194

Back then if POSIX support was actually serious I would never have bothered to play with Linux, by buying the first edition of Linux Unleashed book.

I also bet that many others would not have cared, and used UNIX/NT personality if that was the case.

aryonoco · 2025-10-30T06:26:41 1761805601

Microsoft technically delivered something very close to OS/2’s “Personalities” in Windows NT 4. They called it "Environment subsystems". Each subsystem could run applications written for different operating systems, the 3 available ones were Win32, OS/2 and POSIX. Then there was the "Integral subsystem", which operated system-specific functions on behalf of environment subsystems.

But every subsystem other than Win32 was kneecapped mostly due to politics and market positioning.

In late 90s Microsoft bought a company which had developed a more enhanced Unix subsystem and rebranded it as Interix and marketed as Windows Subsytem for Unix (SFU).

I believe the original WSL was a resurrection of SFU before WSL2 pivoted to a VM-based approach.

pjmlp · 2025-10-30T14:38:08 1761835088

Nope, WSL 1.0 was based on pico-process, see Drawbridge project.

https://www.microsoft.com/en-us/research/project/drawbridge/

https://learn.microsoft.com/en-us/archive/blogs/wsl/pico-pro...

aryonoco · 2025-10-30T23:20:18 1761866418

Thank you for the correction

p_l · 2025-10-30T09:53:25 1761818005

No, the original WSL was a weird new thing where NT kernel-level driver actually serviced Linux system calls.

IIRC, Interix still used same approach as original posix subsystem (and Windows and OS/2 subsystems) of providing the interface as DLL that ultimately your application would be linked against.

aryonoco · 2025-10-30T23:20:30 1761866430

Thank you for the correction

dhosek · 2025-10-24T15:17:57 1761319077

> I believe 2026 will finally be the year of Linux desktop.

I’ve been hearing this substituting in YYYY+1 every YYYY for the last quarter century.

The year of Linux desktop will never come. Why?

- Money. Hardware manufacturers make more money selling computers that are optimized for Windows and there is nothing on the horizon that will change that meaning that the Linux desktop experience is always the worst of the three main options for hardware compatibility.

- Microsoft. Call me when Office runs natively in Linux. You might be happy with LibreOffice or Google Docs, but MS Office still dominates the space (and as someone who does a lot of writing and has a number of options, I find Word to be better than any of the alternatives, something that 30 years ago I would have scoffed at).

- Fidgetiness. All the tweaking and customizing that Linux fans like is an annoyance for most people. Every customization I have on my computer is one more thing I need to keep track of if I get a new computer and frankly it’s more than a little bit of a pain.

dhosek · 2025-10-23T15:25:41 1761233141

But there’s still the option of using the desktop apps. Personally I hate using browser-based apps, but then I’m also old.

supportengineer · 2025-10-23T15:41:54 1761234114

Back when people still used dial-up, I once observed a sys admin in our IT department using a custom, proprietary Windows application developed by a vendor, used for ordering purposes. The whole thing was proprietary, client, protocol, and server, and it was awesome to behold.

dhosek · 2025-10-23T20:16:20 1761250580

That was pretty common back in those days. With the appification of a lot of websites, it’s coming back (albeit using rest instead of a proprietary protocol).

busterarm · 2025-10-23T15:50:18 1761234618

Except Windows itself is moving to make sure you're always online and requires an internet connection & Microsoft account to log in.

I mean sure, this isn't the case if you're on an AD. I just wonder for how long.

SubiculumCode · 2025-10-23T16:08:14 1761235694

Windows moved to Always Online, but I moved to Never on Windows. Funny how that all works.

brewtide · 2025-10-23T16:38:13 1761237493

I have a windows vm that I use perhaps every few weeks for sketchup (because for the life of me, I cannot get wine to run it correctly -- it'll run but not SAVE...).

Every time I run the VM, it has windows updates to install. I guess it's a bit nicer swiping away from the VM and doing something else when it updates but it's a real solid reminder why I "moved away".

hyperman1 · 2025-10-24T18:02:15 1761328935

Funny, actually.

I've noticed there is a point in dual booting where you do enough in Linux that you can't get back to windows without it updating. This pushes you to stay longer and longer in Linux, to avoid the dreaded update.

I've already seen a few people accidentally pestering themselves out of windows this way.

hulitu · 2025-10-24T06:57:45 1761289065

> Every time I run the VM, it has windows updates to install

You can disable with group policy. Or stop after boot the update service.

SoftTalker · 2025-10-23T16:48:22 1761238102

To be fair, every time I log into Ubuntu there are updates to install.

dhosek · 2025-10-23T01:13:34 1761182014

You can also telnet into port 80 and communicate with an HTTP server.

WorldMaker · 2025-10-23T04:45:12 1761194712

Unless it is HTTP/3.

gsich · 2025-10-23T07:49:04 1761205744

or 2.

lxgr · 2025-10-23T08:50:04 1761209404

Or requires HTTPS, for that matter. (But then you can s_client into it :)

dhosek · 2025-10-23T15:19:05 1761232745

Admittedly, it’s been decades since I tried this and I was deliberate about saying http and not https.

dhosek · 2025-10-20T20:54:06 1760993646

And the bumps for the mountains on that globe (assuming you had a fancy one) were gross exaggerations:

https://dahosek.substack.com/p/one-million-stories