Many freighters have space for the 12 passengers they're allowed to carry before they get reclassified to passenger ship with all the post-Titanic SOLAS requirements those incur.
The first 12 however are easy, they self-select to be low-maintenance, they just get their breakfast/lunch/dinner portion from the ship's cook, and have their cabin. A little electricity, reasonable running water, and they're contend.
Essentially a hotel room plus partaking the staff dinner/in-House-catering.
Most people just can't handle the boredom these days, and many don't have that many days to spare, either.
My brother did this (Southampton -> Baltimore). It's a lot slower than the Queen Mary. He told me when he arrived "You don't know what time is until you've already watched all the movies you bought with you, read all the books, done all the work, and you realize you've still got 5 days left".
And he's probably the most able-to-handle-boredom human being that I've ever known.
The passenger capacity is mostly expected to be used by people related to the shipping industry, basically repositioning people the same way airlines do.
Like you'll have two guys being repositioned and another guy will be there with a clipboard and tasked with something.
Due to longterm conditioning from reading tech article headlines and discussions, my brain now autocompletes the word "cargo" to "cargo cult" every time it sees it in a tech context.
So I read "Largest cargo cult sailboat completes first Atlantic crossing" and was immediately intrigued...
Grapheme count is not a useful number. Even in a monospaced font, you’ll find that the grapheme count doesn’t give you a measurement of width since emoji will usually not be the same width as other characters.
Most of the UI toolkits defer to font-based layout calculation engines rather than grapheme counts. Grapheme counts are a handy approximation in many cases, but you aren't going to get truly accurate text selection or cursor positions without real geometry from a font and whatever layout engine is closest to it.
(Fonts may disagree on supported ligatures, for instance, or not support an emoji grapheme cluster and fall back to displaying multiple component emoji, or layout multiple graphemes in two dimensions for a given language [CJK, Arabic, even Math symbols, even before factoring in if a font layout engine supports the optional but cool UnicodeMath [0]] or any number of other tweaks and distinctions between encoding a block of text and displaying a block of text.)
Ligatures are a whole other thing independent of graphemes. Grapheme clustering is based entirely independently of the font and a sequence like U+1F1E6 U+1F1E6 is considered a single grapheme according to the Unicode specification even though no font will display this as a single character (it will instead be rendered as something resembling two A’s in boxes in most fonts).
I included ligatures as a logically separate concept that needs to be accounted for when counting the "displayed width" of a string and how you handle cursor position and selection logic.
However, ligatures are a part of grapheme clustering and it isn't entirely independent: Examples above and elsewhere include ligatures that have Unicode encodings (ex: fi). Ligatures have been a part of character encodings and have affected grapheme clustering since EBCDIC (directly from which Unicode inherits a lot of its single codepoint Latin ligatures). There are way too many debates if normal forms should encode ligatures or always decompose them or some other strategy. The normal forms (NFC and NFD, NFKC and NFKD) themselves are embedded as a step of several of the standard grapheme clustering algorithms.
Some people think ligatures should be entirely left to fonts and Unicode ligatures a relic of the IBM bitmap font past. Some fonts think it would be nice if some ligatures were more directly encoded and normal forms could do a first pass for them. Unicode has had portions of the specification on both sides of that argument. It generally leans away from the backward compatibility ligatures and generally normalizes them to decomposition, especially in locales like "en-us" these days, but it doesn't always and in every locale. (All of that is before you start to consider languages that are almost nothing but long sequences of ligatures, including but not limited to Arabic.)
You can't do grapheme clustering and entirely ignore ligatures. You certainly can't count "display width" or "display position" without paying attention to ligatures, which was the point of bringing them up alongside mentions of grapheme clustering length and why it is insufficient for "display position" and "display width".
Umm, you’re confusing things here. When you said “several of the standard grapheme clustering algorithms” that indicated the first element of confusion. There is one grapheme clustering algorithm in Unicode.¹ Ligature combinations, whether encoded in Unicode or not are independent of the concept of graphemes and are very much font-dependent, as is whether or not a grapheme cluster is displayed as multiple characters or a single character (any two consecutive regional indicators are considered a single grapheme, whether they map to a national flag or not and the set of national flags and whether those flags are even displayed or the output is just the regional indicator symbol is up to the font). Normalization does not play a role in grapheme clustering at all, which is why á and a+acute accent are both considered single graphemes.
I’m puzzled about the assertion of EBCDIC having Latin ligatures because I never encountered them in my IBM mainframe days and a casual search didn’t turn any up. The only 8-bit encoding that I was aware of that included any ligatures was Mac extended ASCII which included fi and fl (well, putting aside TeX’s 7-bit encoding which was only used by TeX and because of its using x20 to hold the acute accent was unusable by other applications which expected a blank character in that spot).
The question about dealing with ligatures for non-Latin scripts generally came down to compatibility with existing systems more than anything else. This is why, for example, the Devanagari and Thai encodings which both have vowel markings which can occur before, after or around a consonant handle the sequence of input differently. Assembly of jambo into characters in Hangul is another case where theoretically, one could have all the displayed characters handled through ligatures and not encode the syllables directly (which is, in fact, how most modern Korean is encoded as evidenced by Korean wikipedia), but because the original Korean font encoding has all the syllables encoded in it² those syllables are part of Unicode as well.
But the bottom line here, is that you seem to be confusing terminology a lot here. You can very much do grapheme clustering without paying attention to ligatures, and rules for things like normalized composition/decomposition are entirely independent of grapheme clustering (the K-forms of both, manage transitions like ¹ to 1 or fi to fi and represent a one way transition).
⸻
1. I wrote a Rust library implementing this and I’ve been following Unicode since it was a proposal from Microsoft and Apple in the 90s, so I know a little about this.
2. I think the more accurate term would be “most” of the syllables as I’ve seen additional syllables added to Unicode over time.
My point has been grapheme clustering doesn't matter as a useful length on its own, especially not for display length. The technical specifics/technical corrections are getting into weeds that are orthogonal to point of the discussion in the first place.
> I’m puzzled about the assertion of EBCDIC having Latin ligatures because I never encountered them in my IBM mainframe days and a casual search didn’t turn any up. The only 8-bit encoding that I was aware of that included any ligatures was Mac extended ASCII which included fi and fl (well, putting aside TeX’s 7-bit encoding which was only used by TeX and because of its using x20 to hold the acute accent was unusable by other applications which expected a blank character in that spot).
EBCDIC had multiple "code pages" to handle various international encodings. (DOS and Windows inherited the "code page" concept from IBM mainframes.) Of the code pages EBCDIC supported in the mainframe era, many were "Publishing" code pages intended for "pretty printing" text to printers that supported presumably and primarily bitmap fonts. A low ID example of such is IBM Code Page (CCSID) 361: https://web.archive.org/web/20130121103957/http://www-03.ibm...
You can see that code page includes fi, fl, ff, ffi, ij, among others even less common in English text.
Most but not all of the IBM Code Pages were a part of the earliest Unicode standards encoding efforts.
Indeed. Several languages have debated dropping or already dropped an easy to access "Length" count on strings and making it much more explicit if you want "UTF-8 Encoded Length" or "Codepoint count" or "Grapheme count" or "Grapheme cluster count" or "Laid out font width".
Why endorse a bad winner when you can make more of the trade-offs more obvious and give programmers a better chance of asking for the right information instead of using the wrong information because it is the default and assuming it is correct?
There’s a fun scene in The Last Action Hero which plays on this convention: A kid who’s gotten transported into a movie and is trying to convince Arnold Scharzenegger’s character that they’re in a movie by asking a bunch of people what their phone number is and pointing out that all the numbers begin with 555.
I think that’s at least in part because some 555 prefix numbers have been assigned for non-directory information uses (I have a vague notion of seeing this for some toll-free numbers).
In the 90s I did a tech writing gig documenting some custom software a company had built for them by one of the big consultancy agencies. It was a bit of a nightmare as the functionality was arranged in a way that reflected the underlying architecture of the program rather than the users’ workflows. Although I suppose if they’d written the software well, I wouldn’t have had as many billable hours writing documentation.
> reflected the underlying architecture of the program rather than the users’ workflows
Is this an inherently bad thing if the software architecture is closely aligned with the problem it solves?
Maybe it's the architecture that was bad. Of course there are implementation details the user shouldn't care about and it's only sane to hide those. I'm curious how/why a user workflow would not be obviously composed of architectural features to even a casual user. Is it that the user interface was too granular or something else?
I find that just naming things according to the behavior a layperson would expect can make all the difference. I say all this because it's equally confusing when the developer hides way too much. Those developers seem to lack experience outside their own domain and overcomplicate what could have just been named better.
Developers often don’t think it’s a bad thing because that’s how we think about software. Regular users think about applications as tools to solve a problem. Being confronted by implementation details is no problem for people with the base knowledge to understand why things are like that, but without that knowledge, it’s just a confusing mess.
If you ever spens time with the low level SAP GUIs, then yes, you will find out why that's definetly a bad thing. Software should reflect users processes. The code below is just an implementation detail and should never impact the design of the interfaces.
I remember at the time there was also going to be the wonderful new kernel that would allow OS/2 and MacOS to coexist on the same machine. As someone who had a Mac and an OS/2 machine side-by-side on his desk, this seemed like it could be a wonderful thing, but alas, it was never to come to be.
I was just a kid during the 1990s when all of this was happening, but a few years ago I remember reading about an IBM project named GUTS where one kernel would run multiple OS "personalities":
Microsoft technically delivered something very close to OS/2’s “Personalities” in Windows NT 4. They called it "Environment subsystems". Each subsystem could run applications written for different operating systems, the 3 available ones were Win32, OS/2 and POSIX. Then there was the "Integral subsystem", which operated system-specific functions on behalf of environment subsystems.
But every subsystem other than Win32 was kneecapped mostly due to politics and market positioning.
In late 90s Microsoft bought a company which had developed a more enhanced Unix subsystem and rebranded it as Interix and marketed as Windows Subsytem for Unix (SFU).
I believe the original WSL was a resurrection of SFU before WSL2 pivoted to a VM-based approach.
No, the original WSL was a weird new thing where NT kernel-level driver actually serviced Linux system calls.
IIRC, Interix still used same approach as original posix subsystem (and Windows and OS/2 subsystems) of providing the interface as DLL that ultimately your application would be linked against.
> I believe 2026 will finally be the year of Linux desktop.
I’ve been hearing this substituting in YYYY+1 every YYYY for the last quarter century.
The year of Linux desktop will never come. Why?
- Money. Hardware manufacturers make more money selling computers that are optimized for Windows and there is nothing on the horizon that will change that meaning that the Linux desktop experience is always the worst of the three main options for hardware compatibility.
- Microsoft. Call me when Office runs natively in Linux. You might be happy with LibreOffice or Google Docs, but MS Office still dominates the space (and as someone who does a lot of writing and has a number of options, I find Word to be better than any of the alternatives, something that 30 years ago I would have scoffed at).
- Fidgetiness. All the tweaking and customizing that Linux fans like is an annoyance for most people. Every customization I have on my computer is one more thing I need to keep track of if I get a new computer and frankly it’s more than a little bit of a pain.
Back when people still used dial-up, I once observed a sys admin in our IT department using a custom, proprietary Windows application developed by a vendor, used for ordering purposes. The whole thing was proprietary, client, protocol, and server, and it was awesome to behold.
That was pretty common back in those days. With the appification of a lot of websites, it’s coming back (albeit using rest instead of a proprietary protocol).
I have a windows vm that I use perhaps every few weeks for sketchup (because for the life of me, I cannot get wine to run it correctly -- it'll run but not SAVE...).
Every time I run the VM, it has windows updates to install. I guess it's a bit nicer swiping away from the VM and doing something else when it updates but it's a real solid reminder why I "moved away".
I've noticed there is a point in dual booting where you do enough in Linux that you can't get back to windows without it updating. This pushes you to stay longer and longer in Linux, to avoid the dreaded update.
I've already seen a few people accidentally pestering themselves out of windows this way.
reply