More

llimllib · 2025-09-30T13:44:07 1759239847

That water's _starchier_ but it's not a myth. Here's Kenji on it: https://www.seriouseats.com/how-to-cook-pasta-salt-water-boi...

(I get what you're saying, spiritually, your pasta water from your giant pot of one box of pasta isn't gonna do much to thicken your sauce. But it's not a myth, just a matter of degree)

llimllib · 2025-09-23T15:46:34 1758642394

link to the zip code data is broken too; https://public.opendatasoft.com/explore/assets/us-zip-code-l... 404s

Even when I worked for Medicare I couldn't get the damned post office to give us accurate zip code data! It's terrible geodata but also almost everybody remembers it and most zip codes map to one county, so it was the best UI we found for getting a general area for where a person lived.

wombatpm · 2025-09-23T16:35:14 1758645314

Except when it doesn't. 60447 is in three counties in Illinois. Which causes confusion when laws/regulations are applied at a county level.

llimllib · 2025-09-23T17:05:01 1758647101

yeah! There are like 12 three-county zip codes, some are really fun, like places where a boat delivers the mail and goes to multiple states along the lake. And some zips don't refer to geographical areas at all, and others are military bases.

It was still the best UI option despite that - if you entered a ZIP that corresponded to multiple states/counties we'd pop up a second box that asked you which you lived in, but for 99% of people it was all we needed

llimllib · 2025-08-26T12:34:18 1756211658

This CL documents the switch of UserConfigDir from ~/Library/Preferences to ~/Library/Application Support: https://go-review.googlesource.com/c/go/+/181177

From my perspective it makes sense as a default for go, which may be used to make either apps (things with bundle identifiers that go in /Applications) or CLI applications (which I wish would use ~/.config, but I understand that that's just my preference)

llimllib · 2025-08-26T12:23:41 1756211021

I expect _applications_ to put their config there, as the author says; something that lives in /Applications or ~/Applications and has a bundle specifier.

I wish I expected CLI programs to put their config in ~/.config, but I do actually expect them to just dump them into ~ annoyingly

nickm12 · 2025-08-31T05:22:12 1756617732

We have different expectations I guess. I think of CLI as applications, even though they aren't "Mac Apps". I don't see what is gained applications putting their files in different places based on how I interact with them.

llimllib · 2025-08-13T19:43:40 1755114220

Pomax's primer on bézier curves is the reference they used: https://pomax.github.io/bezierinfo/

They do a pretty good job introducing the mathematics gently I think. But maybe work backwards from whatever you don't understand?

llimllib · 2025-08-11T12:07:03 1754914023

Kind of! This script is assuming that you're dealing with a byte slice, which means you've already encoded your unicode data.

If you just encoded your string to bytes naïvely, it will probably-mostly still work, but it will get some combining characters wrong if they're represented differently in the two sources you're comparing. (eg, e-with-an-accent-character vs. accent-combining-character+e)

If you want to be correct-er you'll normalize your UTF string[1], but note that there are four different defined ways to do this, so you'll need to choose the one that is the best tradeoff for your particular application and data sources.

[1]: https://en.wikipedia.org/wiki/Unicode_equivalence#Normalizat...

codethief · 2025-08-11T12:20:23 1754914823

> If you just encoded your string to bytes naïvely

By "naïvely" I assume you mean you would just plug in UTF-8 bytestrings for haystack & needle, without adjusting the implementation?

Wouldn't the code still need to take into account where characters (code points) begin and end, though, in order to prevent incorrect matches?

burntsushi · 2025-08-11T12:37:31 1754915851

IDK what "encoded your string to bytes naively" means personally. There is only one way to correctly UTF-8 encode a sequence of Unicode scalar values.

In any case, no, this works because UTF-8 is self synchronizing. As long as both your needle and your haystack are valid UTF-8, the byte offsets returned by the search will always fall on a valid codepoint boundary.

In terms of getting "combining characters wrong," this is a reference to different Unicode normalization forms.

To be more precise... Consider a needle and a haystack, represented by a sequence of Unicode scalar values (typically represented by a sequence of unsigned 32-bit integers). Now encode them to UTF-8 (a sequence of unsigned 8-bit integers) and run a byte level search as shown by the OP here. That will behave as if you've executed the search on the sequence of Unicode scalar values.

So semantically, a "substring search" is a "sequence of Unicode scalar values search." At the semantic level, this may or may not be what you want. For example, if you always want `office` to find substrings like `oﬃce` in your haystack, then this byte level search will not do what you want.

The standard approach for performing a substring search that accounts for normalization forms is to convert both the needle and haystack to the same normal form and then execute a byte level search.

(One small caveat is when the needle is an empty string. If you want to enforce correct UTF-8 boundaries, you'll need to handle that specially.)

llimllib · 2025-08-11T13:19:27 1754918367

By naively, I meant without normalization.

You know much more about this than I do though

edit: this is what I mean for example, that `tést` != `tést` in rg, because \ue9 (e with accent) != e\u0301 (e followed by combining character accent)

    $ printf "t\\u00E9st" > /tmp/a 
    $ xxd /tmp/a
    00000000: 74c3 a973 74                             t..st
    $ cat /tmp/a
    tést

    $ printf "te\\u0301st" > /tmp/b 
    $ xxd /tmp/b
    00000000: 7465 cc81 7374                           te..st
    $ cat /tmp/b
    tést

    $ printf "t\\u00E9st" | rg -f - /tmp/a
    1:tést
    $ printf "t\\u00E9st" | rg -f - /tmp/b
    # ed: no result

edit 2: if we normalize the UTF-8, the two strings will match

    $ printf "t\\u00E9st" | uconv -x any-nfc | xxd
    00000000: 74c3 a973 74                             t..st
    $ printf "te\\u0301st" | uconv -x any-nfc | xxd
    00000000: 74c3 a973 74                             t..st

Which you know, and indicate! Just working an example of it that maybe will help people understand, I dunno

jiehong · 2025-08-11T12:25:17 1754915117

Thanks for this detailed answer!

llimllib · 2025-08-11T11:22:44 1754911364

std.mem.eql is here, super easy to read: https://github.com/ziglang/zig/blob/master/lib/std/mem.zig#L...

My read is it would use SIMD if T is @Vector, and not otherwise? But I'm neither a zig nor SIMD expert

ozgrakkurt · 2025-08-11T13:32:12 1754919132

Pretty sure that compiles into assembly for any primitive type like integers, floats etc.

IshKebab · 2025-08-11T14:32:06 1754922726

What do you mean it "compiles into assembly"?

ozgrakkurt · 2025-08-11T15:57:53 1754927873

I meant to write SIMD

llimllib · 2025-08-03T21:19:13 1754255953

I have a blog post[1] and accompanying repo[2] that shows how to use SEA to build a binary (and compares it to bun and deno) and strip it down to 67mb (for me, depends on the size of your local node binary).

[1]: https://notes.billmill.org/programming/javascript/Making_a_s...

[2]: https://github.com/llimllib/node-esbuild-executable#making-a...

sgarland · 2025-08-04T00:34:42 1754267682

> 67 MB binary

I hope you can appreciate how utterly insane this sounds to anyone outside of the JS world. Good on you for reducing the size, but my god…

Culonavirus · 2025-08-04T15:46:56 1754322416

It's not insane at all. Any binary that gets packed with the entire runtime will be in MBs. But that's the point, the end user downloads a standalone fragment and doesn't need to give a flying fuck about what kind of garbage has to be preinstalled for the damn binary to work. You think people care if a binary is 5MB or 50MB in 2025? It's more insane that you think it's insane than it is actually insane. Reminds me of all the Membros and Perfbros crying about Electron apps and meanwhile these things going brrrrrrr with 100MB+ binaries and 1GB+ eaten memory on untold millions of average computers

sgarland · 2025-08-04T19:08:37 1754334517

The fact that it’s normalized to use obscene amounts of memory for tiny apps should not be celebrated.

I assure you, at scale this belief makes infra fall apart, and I’ve seen it happen so, so many times. Web devs who have never thought about performance merrily chuck huge JSON blobs or serialized app models into the DB, keep clicking scale up when it gets awful, and then when that finally doesn’t work, someone who _does_ care gets hired to fix it. Except that person or team now has to not only fix years of accumulated cruft, but also has to change a deeply embedded culture, and fight for dev time against Product.

int_19h · 2025-08-04T06:17:29 1754288249

Have you looked at the average size of, say, a .NET bundled binary.

const_cast · 2025-08-07T21:43:29 1754603009

The difference is dotnet is both a much more feature complete runtime and a much more performant one. Its not like you're getting all too much with that node runtime - it was, and still is although to a lesser extent, pretty bare-bones.

tracker1 · 2025-08-04T16:52:55 1754326375

Not sure why the down vote(s)... My main work project isn't that big yet, and the bin directory is like 96mb.

Not that I find it particularly egregious, but my rust (web-server) apps not even optimized are under 10mb easily.

vlucas · 2025-08-04T16:37:00 1754325420

This packages the entire Node runtime and all project dependencies. It's not that insane.

lacasito25 · 2025-08-04T08:59:05 1754297945

Considering that you are bundling an entire runtime not meant to be installed independently on other computers, 67mb isn't that bad.

Go binaries weight 20mb for example.

llimllib · 2025-08-04T02:10:12 1754273412

lol, yes absolutely it's bananas. I wouldn't even consider myself in the JS world!

llimllib · 2025-07-30T15:16:51 1753888611

It's easy to show that you're right and that code isn't dead so it can't be eliminated:

    package main
    
    import (
     "fmt"
    
     "github.com/vvvvv/dlg"
    )
    
    func risky() error {
     _, err := fmt.Printf("unexpected error\n")
     return err
    }
    
    func main() {
     dlg.Printf("something failed: %s", risky())
     risky()
    }

prints "unexpected error" twice

llimllib · 2025-06-20T20:15:54 1750450554

(I'm the original author but not the OP)

It was done in september of '24 with go 1.23.1 and the most recent (at the time) versions of sqlite libraries, as stated.

The page you linked compares golang bindings against each other, not C against golang like my test did

0xjnml · 2025-06-21T13:52:37 1750513957

> The page you linked compares golang bindings against each other, not C against golang like my test did

No one disputed that. But it follows that your benchmarks are comparing C to [some very] outdated versions of Go packages. Which is what I tried to point out.