Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A fundamental problem with JSON parsing is that it has variable length fields that don't encode their length, in a streaming scenario you basically need to keep resizing your buffer until the data fits. If the data is on disk and not streaming you may get away with reading ahead to find the end of the field first, but that's also not particularly fast.

Schemas can't fix that.



Why couldn't they? Schemas can allow you to have that as part of your schema. E.g. JSON Schema lets you define max and min lengths on variable-sized things. You can avoid all dynamic resizing if you're careful enough.

I'll definitely agree that most things won't fully take advantage of that even if you provide that information, but it is definitely possible to do so.


Unless you have fixed field lengths, you're still doing twice the work either scanning or resizing the buffer (or over-allocating memory I guess).

That said, JSON is designed for human readability above performance, so it's a design concession that makes sense. What doesn't make sense is using JSON anywhere performance matters.


Only if you are using pointers/slices into the buffer as an optimisation.

Otherwise there is no need to keep a buffer of anything after it has been parsed.


I'm talking about during parsing.

Let's assume I send you a JSON object that is one very long string and nothing else. It's e.g. 1 GB in size. To know you need to allocate a 1GB buffer, you need to first scan it, and then copy it; or keep reallocating the same buffer until it fits.

It's an absurd case, but shorter strings face similar overhead.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: