(I get what you're saying, spiritually, your pasta water from your giant pot of one box of pasta isn't gonna do much to thicken your sauce. But it's not a myth, just a matter of degree)
Even when I worked for Medicare I couldn't get the damned post office to give us accurate zip code data! It's terrible geodata but also almost everybody remembers it and most zip codes map to one county, so it was the best UI we found for getting a general area for where a person lived.
yeah! There are like 12 three-county zip codes, some are really fun, like places where a boat delivers the mail and goes to multiple states along the lake. And some zips don't refer to geographical areas at all, and others are military bases.
It was still the best UI option despite that - if you entered a ZIP that corresponded to multiple states/counties we'd pop up a second box that asked you which you lived in, but for 99% of people it was all we needed
From my perspective it makes sense as a default for go, which may be used to make either apps (things with bundle identifiers that go in /Applications) or CLI applications (which I wish would use ~/.config, but I understand that that's just my preference)
I expect _applications_ to put their config there, as the author says; something that lives in /Applications or ~/Applications and has a bundle specifier.
I wish I expected CLI programs to put their config in ~/.config, but I do actually expect them to just dump them into ~ annoyingly
We have different expectations I guess. I think of CLI as applications, even though they aren't "Mac Apps". I don't see what is gained applications putting their files in different places based on how I interact with them.
Kind of! This script is assuming that you're dealing with a byte slice, which means you've already encoded your unicode data.
If you just encoded your string to bytes naïvely, it will probably-mostly still work, but it will get some combining characters wrong if they're represented differently in the two sources you're comparing. (eg, e-with-an-accent-character vs. accent-combining-character+e)
If you want to be correct-er you'll normalize your UTF string[1], but note that there are four different defined ways to do this, so you'll need to choose the one that is the best tradeoff for your particular application and data sources.
IDK what "encoded your string to bytes naively" means personally. There is only one way to correctly UTF-8 encode a sequence of Unicode scalar values.
In any case, no, this works because UTF-8 is self synchronizing. As long as both your needle and your haystack are valid UTF-8, the byte offsets returned by the search will always fall on a valid codepoint boundary.
In terms of getting "combining characters wrong," this is a reference to different Unicode normalization forms.
To be more precise... Consider a needle and a haystack, represented by a sequence of Unicode scalar values (typically represented by a sequence of unsigned 32-bit integers). Now encode them to UTF-8 (a sequence of unsigned 8-bit integers) and run a byte level search as shown by the OP here. That will behave as if you've executed the search on the sequence of Unicode scalar values.
So semantically, a "substring search" is a "sequence of Unicode scalar values search." At the semantic level, this may or may not be what you want. For example, if you always want `office` to find substrings like `office` in your haystack, then this byte level search will not do what you want.
The standard approach for performing a substring search that accounts for normalization forms is to convert both the needle and haystack to the same normal form and then execute a byte level search.
(One small caveat is when the needle is an empty string. If you want to enforce correct UTF-8 boundaries, you'll need to handle that specially.)
I have a blog post[1] and accompanying repo[2] that shows how to use SEA to build a binary (and compares it to bun and deno) and strip it down to 67mb (for me, depends on the size of your local node binary).
It's not insane at all. Any binary that gets packed with the entire runtime will be in MBs. But that's the point, the end user downloads a standalone fragment and doesn't need to give a flying fuck about what kind of garbage has to be preinstalled for the damn binary to work. You think people care if a binary is 5MB or 50MB in 2025? It's more insane that you think it's insane than it is actually insane. Reminds me of all the Membros and Perfbros crying about Electron apps and meanwhile these things going brrrrrrr with 100MB+ binaries and 1GB+ eaten memory on untold millions of average computers
The fact that it’s normalized to use obscene amounts of memory for tiny apps should not be celebrated.
I assure you, at scale this belief makes infra fall apart, and I’ve seen it happen so, so many times. Web devs who have never thought about performance merrily chuck huge JSON blobs or serialized app models into the DB, keep clicking scale up when it gets awful, and then when that finally doesn’t work, someone who _does_ care gets hired to fix it. Except that person or team now has to not only fix years of accumulated cruft, but also has to change a deeply embedded culture, and fight for dev time against Product.
The difference is dotnet is both a much more feature complete runtime and a much more performant one. Its not like you're getting all too much with that node runtime - it was, and still is although to a lesser extent, pretty bare-bones.
> The page you linked compares golang bindings against each other, not C against golang like my test did
No one disputed that. But it follows that your benchmarks are comparing C to [some very] outdated versions of Go packages. Which is what I tried to point out.
(I get what you're saying, spiritually, your pasta water from your giant pot of one box of pasta isn't gonna do much to thicken your sauce. But it's not a myth, just a matter of degree)
reply