SQLite Functions for Working with JSON

bambax · on Aug 11, 2023

The cool thing for working with json is to store each json document as is in one column, then make virtual columns that store some specific information you want to query, using some combination of json_extract, then index those columns.

This makes for super-fast search, and the best part is you don't have to choose what to index at insert time; you can always make more virtual columns when you need them.

(You can still also search non-indexed, raw json, although it may take a long time for large collections).

I love SQLite so much.

laurencerowe · on Aug 11, 2023

It's a shame SQLite doesn't support indexing arbitrary JSON data similar to Postgres. This works by hashing the paths and values within each JSON doc (or just the paths if you choose json_path_ops which you probably should.) https://www.postgresql.org/docs/current/datatype-json.html#J...

Unfortunately I don't think SQLite has generalized support for multi valued indices, though perhaps it would be possible to implement using the virtual table mechanism like full text search. https://www.sqlite.org/fts5.html

weinzierl · on Aug 11, 2023

With a feature like this is there still a good reason to use MongoDB?

Honest question, I know Oracle and Postges well, but not their JSON features. I'm just starting to learn MongoDB seriously because of a current project.

pravus · on Aug 11, 2023

Not really, imo. The biggest advantage of Mongo was being able to store a single hierarchical document and search it. Now that you can do that in the RDBMS there's no advantage at all. You also throw away all capability of doing relations in the database with Mongo which is optional in an RDBMS and sometimes very beneficial.

Mongo could win if it were more performant or had a better query syntax... but it isn't and it doesn't. It might have some slight advantage in the speed you can set up a replicated cluster but you'll pay long-term in overall performance from my experience. If all you are doing is storing documents, just use an S3 bucket, etc...

buzzm · on Aug 11, 2023

IMHO, the MongoDB query language is more suitable for richly structured shapes than SQL, esp. in the area of polymorphism and arrays.

pravus · on Aug 11, 2023

I will admit that polymorphic types can still be pretty awful in an RDBMS. And yes, juggling sets can take some creativity with SQL.

CalRobert · on Aug 11, 2023

After Postgres added JSON support I couldn't get away from MongoDB fast enough. I haven't missed it.

buzzm · on Aug 11, 2023

SQL extensions for JSON in RDBMS is not the same thing as MongoDB query language on MongoDB. Try this on for size: https://moschetti.org/rants/postgresvmongo.html

hahn-kev · on Aug 11, 2023

A fair bit of that is out of date as it's from 2017.

ElectricalUnion · on Aug 11, 2023

To me the only really good reason to use MongoDB those days is very high data ingestion rate. It was the main reason why CERN used it in the LHC, but I don't know if that solution is still in place.

rajamaka · on Aug 11, 2023

Maybe a personal frustration but I find postgres JSON operators really annoying to deal with compared to mongosh

chunkyks · on Aug 11, 2023

Not in precisely the way you mean, but that can be done: https://www.sqlite.org/expridx.html

hans_castorp · on Aug 11, 2023

But you need to know in advance which keys/paths you want to index.

Postgres can index the entire JSON document (or parts of it) and they can support unknown query condition (e.g. using a JSON path). This isn't as fast as a proper B-Tree index, but still faster than a full table scan.

Hakkin · on Aug 11, 2023

I think the limiting factor is that there's no way to split the single JSON document out into all its paths and values in a way that SQLite can index it. There's `json_each` and `json_tree` but there's no way to generate a (dynamic) table using these functions that you could then index, as far as I know.

I believe this could probably be done using a trigger on document insert, but that would involve actually inserting each path and value into it's own table, rather than SQLite generating it on the fly, so it would likely more than double the storage requirement and require inserting potentially hundreds of rows for each document, depending on what your original document structure looks like.

remus · on Aug 11, 2023

I love this setup for working with external APIs. Often I'll only want a couple of fields from a response and don't want to build out a huge schema to match the entire response, but it's hard to know in advance which fields will be most useful. By spending some disk space to store the entire response you can then add fields you need ad hoc as virtual cols, like you describe.

noman-land · on Aug 11, 2023

Neat idea. I've been deep diving into sqlite the past few months and it's gotta be one of my favorite projects ever.

Got any other neat sqlite things or resources I might check out?

simonw · on Aug 11, 2023

I've baked a ton of different SQLite tricks - including things like full-text indexing support and advanced alter table methods - into my sqlite-utils CLI tool and Python library: https://sqlite-utils.datasette.io

My Datasette project provides tools for exploring, analyzing and publishing SQLite databases, plus ways to expose them via a JSON API: https://datasette.io

I've also written a ton of stuff about SQLite on my two blogs:

- https://simonwillison.net/tags/sqlite/

- https://til.simonwillison.net/sqlite

jamietanna · on Aug 11, 2023

Your excellent work around SQLite led me to building https://dmd.tanna.dev and some other projects with it recently, and it's been such a game changer for me, so thank you

simonw · on Aug 11, 2023

This is really neat! Found your live demo: https://dependency-management-data-example.fly.dev/datasette...

Blogged it: https://simonwillison.net/2023/Aug/11/dependency-management-...

jamietanna · on Aug 11, 2023

Oh wow, thanks very much :D yes Datasette was _so_ useful with making my organisation able to be more productive with understanding the data, made it much nicer than folks writing SQL locally, huge props for building it

naikrovek · on Aug 11, 2023

TWO blogs? you're pretty prolific, even with blogging.

I have ADHD, I can barely finish a sentence

nattaylor · on Aug 11, 2023

I think this parquet virtual table extension is a pretty neat example of the power of extension https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html

buzzm · on Aug 11, 2023

SQLite is good stuff. Fan as I am of BSON over JSON as a more precise and comprehensive datatype for data management, I wrote this: https://github.com/buzzm/sqlitebson

voussoir · on Aug 20, 2023

Wow, that's a game changing idea. Thanks!

justin_oaks · on Aug 11, 2023

I've been working with JSON in SQLite lately for backing up a bunch of data from an API that returns JSON.

The JSON functions work well for basic usage. And I'm so glad that the -> and ->> operators were added since it makes the syntax so much shorter than using the json_extract() function. I'd say SQLite works with JSON as well as the other databases mentioned: Postgres and MySQL.

My big wish is that we could get something as advanced as jq for use in manipulating JSON. The JSON path language used for extracting JSON is very basic.

Alas, I probably am asking for too much for a relational database to support advanced JSON manipulation too. You can do a fair amount of JSON manipulation by extracting the data to a tabular format and using SQL to do the manipulation from there.

simonw · on Aug 11, 2023

Since SQLite supports custom SQL functions, you can add JQ support to it pretty easily.

I just threw together a plugin for my sqlite-utils CLI tool that adds a jq() function here:

https://github.com/simonw/sqlite-utils-jq

Use it like this:

    sqlite-utils memory "select jq(:doc, :expr) as result" \
      -p doc '{"foo": "bar"}' \
      -p expr '.foo' \
      --table

Or install sqlite-utils-litecli from https://github.com/simonw/sqlite-utils-litecli to get an interactive shell that can use the new function:

    brew install sqlite-utils
    sqlite-utils install sqlite-utils-litecli sqlite-utils-jq
    sqlite-utils litecli data.db
    # ...
    Version: 1.9.0
    Mail: https://groups.google.com/forum/#!forum/litecli-users
    GitHub: https://github.com/dbcli/litecli
    data.db> select jq('{"foo": "bar"}', '.foo')
    +------------------------------+
    | jq('{"foo": "bar"}', '.foo') |
    +------------------------------+
    | "bar"                        |
    +------------------------------+
    1 row in set
    Time: 0.031s

justin_oaks · on Aug 11, 2023

Wow! That's cool. Thanks!

pletnes · on Aug 11, 2023

Maybe relevant - pyjq lets you jq from your python!

https://pypi.org/project/pyjq/

coredog64 · on Aug 11, 2023

JMESPath is another competitor in the space of “Write arbitrary queries against JSON from Python”. It won’t help if you already have some jq that you want to reuse. On the flip side, Amazon and Microsoft both make use of it in their CLI so documentation is pretty easy to come by.

marginalia_nu · on Aug 11, 2023

Wait, how does this work? JQ is written in C as far as I understand. Does this fork a jq process for each call? How can that possibly be fast if it's done per-record? Or does it do streaming processing?

simonw · on Aug 11, 2023

It uses the libjq C library directly, integrating with it using Cython https://github.com/doloopwhile/pyjq/blob/master/_pyjq.pyx

jq.py is an alternative package which uses the same technique: https://github.com/mwilliamson/jq.py/blob/master/jq.pyx

pletnes · on Aug 11, 2023

Wait - sqlite can register python functions to be used from sqlite https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/

So you could, presumably, write a python function to process json using jq, i.e make a jq-from-sqlite function?

Presumably you can do the same in C directly with libjq + sqlite directly, with more effort but better performance.

simonw · on Aug 11, 2023

Yeah, a C extension for SQLite that embedded jq would be neat.

pletnes · on Aug 11, 2023

I didn’t knoe about jq.py - any big known pros/cons?

simonw · on Aug 11, 2023

It looks to me like jq.py is more actively maintained than pyjq- it bundles a more recent jq release.

Dig1t · on Aug 11, 2023

I wrote a gnarly migration using this API once. Involved extracting deeply nested json objects, conditionally transforming the values, and storing them in a new column. I wrote that migration and never touched it again and was always afraid to go back and mess with it because it was so hard to reason about.

This API is extremely useful but I would definitely recommend caution in its use, because it’s very easy to write a giant SQL statement that’s very powerful but impossible to read.

simonw · on Aug 11, 2023

I've found the trick for that kind of thing is to write automated tests for it - I sometimes write a set of pytest tests to exercise complex SQLite queries

Gives me way more confidence in them, and means I can change them later and feel confident I haven't broken them.

dmezzetti · on Aug 11, 2023

The built-in JSON functionality is very powerful. txtai (https://github.com/neuml/txtai) takes full advantage of it and stores all relational data as JSON in SQLite.

The ability to build indexes on these JSON functions is important. Found this article to be a good reference: https://www.delphitools.info/2021/06/17/sqlite-as-a-no-sql-d...

jamietanna · on Aug 11, 2023

I ended up writing https://www.jvt.me/posts/2023/03/13/sqlite-json/ as a quick reference for myself, as I found I needed something more copy-paste able for my own needs, as well as the official docs

wiredfool · on Aug 11, 2023

I've been using duckdb's json import and parsing to deal with a O(50k) item scrape of an API into newline delimited json to convert it into something more usable and found it quite nice to use.

syntax like:

  create table foo as (select * from read_json_auto('foos.jsonl'))

and for a bit that was ... overly nested.

  create table nested as
    select unnest(json) from (
      select unnest(arr) as json from (
        select json as arr from read_json_auto('nested.jsonl')));

And then exporting that as a fixture for django is as easy as:

  copy (select uid as pk, 'app.Model' as model, tbl as fields from tbl) to 'tbl.jsonl' (format json, array false);

mrtksn · on Aug 11, 2023

SQLite functions for JSON are very useful, I built myself a NoSQL data store for Swift that works with Combine and SwiftUI to reactively update when data changes. The data storage itself is handled by SQLite through putting, extracting and changing JSON string in a schema of a table with a few columns and it works great.

Before building this I looked around for something embeddable like SQLite but to store and process JSON. It turns out, the best NoSQL alternative for SQLite is SQLite. Such a versatile pile of C code.

fcoury · on Aug 11, 2023

I started working in a MongoDB wire protocol adapter layer to a PostgreSQL backend and it relies heavily on the use of JSON functions.

I was really wanting to also allow for SQLite as a backend but not sure if we can do everything that can be done in Postgres. If anyone want to take a look and colaborate on that, here’s a link to the project:

https://oxidedb.com

And a quick ui for a quick try:

https://demo.oxidedb.com

grrdotcloud · on Aug 11, 2023

Not resolving or responding. Kick it?

fcoury · on Aug 11, 2023

Sorry about that, it's just a shortcut for https://github.com/fcoury/oxide.

qwertox · on Aug 11, 2023

I've been using the JSON1 extension on Android with the Android SQLite support library [0] for a couple of years now.

According to this article, `Prior to version 3.38.0, the JSON functions were an extension that would only be included in builds if the -DSQLITE_ENABLE_JSON1 compile-time option was included.` So from the looks of it it is now compiled in by default for about 1.5 years.

Does anyone know if Android's SQLite now includes JSON1 capabilities out of the box, and if so, since which Android version?

[0] https://github.com/requery/sqlite-android

fbdab103 · on Aug 11, 2023

Probably expands the API too much, but one missed use case I just ran into would be a .import function for loading a json list of records or jsonl directly into a table from the command line. This can be roughly replicated through readfile and some juggling, but will not work from stdin (I think? I would love to pipe json directly into a table).

The other alternative I considered was dumping the entire json blob into a one row table, before having a second query extract out the fields.

simonw · on Aug 11, 2023

My sqlite-utils tool can pipe JSON directly into a SQLite table, creating a table with the right schema if it doesn't exist yet: https://sqlite-utils.datasette.io/en/stable/cli.html#inserti...

LikesPwsh · on Aug 11, 2023

That project looks great.

The MSSQL equivalent would be DbaTools, in case anyone is looking.

https://docs.dbatools.io/Write-DbaDbTableData.html

dumdumchan · on Aug 11, 2023

> Experiments have been unable to find a binary encoding that is smaller or faster than a plain text encoding

Why does postgres do it then?

kayodelycaon · on Aug 11, 2023

https://www.postgresql.org/docs/current/datatype-json.html

Long story short: json is a modified text field, jsonb is a specialized binary format that can be loaded without the overhead of parsing. Only jsonb supports indexing.

nickpeterson · on Aug 10, 2023

[flagged]

jgalt212 · on Aug 11, 2023

You just violated the first rule of fight club.

nickpeterson · on Aug 11, 2023

What’s the purpose of having points if you can’t spend them ? ;)