The cool thing for working with json is to store each json document as is in one column, then make virtual columns that store some specific information you want to query, using some combination of json_extract, then index those columns.
This makes for super-fast search, and the best part is you don't have to choose what to index at insert time; you can always make more virtual columns when you need them.
(You can still also search non-indexed, raw json, although it may take a long time for large collections).
It's a shame SQLite doesn't support indexing arbitrary JSON data similar to Postgres. This works by hashing the paths and values within each JSON doc (or just the paths if you choose json_path_ops which you probably should.) https://www.postgresql.org/docs/current/datatype-json.html#J...
Unfortunately I don't think SQLite has generalized support for multi valued indices, though perhaps it would be possible to implement using the virtual table mechanism like full text search. https://www.sqlite.org/fts5.html
With a feature like this is there still a good reason to use MongoDB?
Honest question, I know Oracle and Postges well, but not their JSON features. I'm just starting to learn MongoDB seriously because of a current project.
Not really, imo. The biggest advantage of Mongo was being able to store a single hierarchical document and search it. Now that you can do that in the RDBMS there's no advantage at all. You also throw away all capability of doing relations in the database with Mongo which is optional in an RDBMS and sometimes very beneficial.
Mongo could win if it were more performant or had a better query syntax... but it isn't and it doesn't. It might have some slight advantage in the speed you can set up a replicated cluster but you'll pay long-term in overall performance from my experience. If all you are doing is storing documents, just use an S3 bucket, etc...
To me the only really good reason to use MongoDB those days is very high data ingestion rate. It was the main reason why CERN used it in the LHC, but I don't know if that solution is still in place.
But you need to know in advance which keys/paths you want to index.
Postgres can index the entire JSON document (or parts of it) and they can support unknown query condition (e.g. using a JSON path). This isn't as fast as a proper B-Tree index, but still faster than a full table scan.
I think the limiting factor is that there's no way to split the single JSON document out into all its paths and values in a way that SQLite can index it. There's `json_each` and `json_tree` but there's no way to generate a (dynamic) table using these functions that you could then index, as far as I know.
I believe this could probably be done using a trigger on document insert, but that would involve actually inserting each path and value into it's own table, rather than SQLite generating it on the fly, so it would likely more than double the storage requirement and require inserting potentially hundreds of rows for each document, depending on what your original document structure looks like.
I love this setup for working with external APIs. Often I'll only want a couple of fields from a response and don't want to build out a huge schema to match the entire response, but it's hard to know in advance which fields will be most useful. By spending some disk space to store the entire response you can then add fields you need ad hoc as virtual cols, like you describe.
I've baked a ton of different SQLite tricks - including things like full-text indexing support and advanced alter table methods - into my sqlite-utils CLI tool and Python library: https://sqlite-utils.datasette.io
My Datasette project provides tools for exploring, analyzing and publishing SQLite databases, plus ways to expose them via a JSON API: https://datasette.io
I've also written a ton of stuff about SQLite on my two blogs:
Your excellent work around SQLite led me to building https://dmd.tanna.dev and some other projects with it recently, and it's been such a game changer for me, so thank you
Oh wow, thanks very much :D yes Datasette was _so_ useful with making my organisation able to be more productive with understanding the data, made it much nicer than folks writing SQL locally, huge props for building it
SQLite is good stuff. Fan as I am of BSON over JSON as a more precise and comprehensive datatype for data management, I wrote this:
https://github.com/buzzm/sqlitebson
I've been working with JSON in SQLite lately for backing up a bunch of data from an API that returns JSON.
The JSON functions work well for basic usage. And I'm so glad that the -> and ->> operators were added since it makes the syntax so much shorter than using the json_extract() function. I'd say SQLite works with JSON as well as the other databases mentioned: Postgres and MySQL.
My big wish is that we could get something as advanced as jq for use in manipulating JSON. The JSON path language used for extracting JSON is very basic.
Alas, I probably am asking for too much for a relational database to support advanced JSON manipulation too. You can do a fair amount of JSON manipulation by extracting the data to a tabular format and using SQL to do the manipulation from there.
JMESPath is another competitor in the space of “Write arbitrary queries against JSON from Python”. It won’t help if you already have some jq that you want to reuse. On the flip side, Amazon and Microsoft both make use of it in their CLI so documentation is pretty easy to come by.
Wait, how does this work? JQ is written in C as far as I understand. Does this fork a jq process for each call? How can that possibly be fast if it's done per-record? Or does it do streaming processing?
I wrote a gnarly migration using this API once. Involved extracting deeply nested json objects, conditionally transforming the values, and storing them in a new column. I wrote that migration and never touched it again and was always afraid to go back and mess with it because it was so hard to reason about.
This API is extremely useful but I would definitely recommend caution in its use, because it’s very easy to write a giant SQL statement that’s very powerful but impossible to read.
I've found the trick for that kind of thing is to write automated tests for it - I sometimes write a set of pytest tests to exercise complex SQLite queries
Gives me way more confidence in them, and means I can change them later and feel confident I haven't broken them.
The built-in JSON functionality is very powerful. txtai (https://github.com/neuml/txtai) takes full advantage of it and stores all relational data as JSON in SQLite.
I ended up writing https://www.jvt.me/posts/2023/03/13/sqlite-json/ as a quick reference for myself, as I found I needed something more copy-paste able for my own needs, as well as the official docs
I've been using duckdb's json import and parsing to deal with a O(50k) item scrape of an API into newline delimited json to convert it into something more usable and found it quite nice to use.
syntax like:
create table foo as (select * from read_json_auto('foos.jsonl'))
and for a bit that was ... overly nested.
create table nested as
select unnest(json) from (
select unnest(arr) as json from (
select json as arr from read_json_auto('nested.jsonl')));
And then exporting that as a fixture for django is as easy as:
copy (select uid as pk, 'app.Model' as model, tbl as fields from tbl) to 'tbl.jsonl' (format json, array false);
SQLite functions for JSON are very useful, I built myself a NoSQL data store for Swift that works with Combine and SwiftUI to reactively update when data changes. The data storage itself is handled by SQLite through putting, extracting and changing JSON string in a schema of a table with a few columns and it works great.
Before building this I looked around for something embeddable like SQLite but to store and process JSON. It turns out, the best NoSQL alternative for SQLite is SQLite. Such a versatile pile of C code.
I started working in a MongoDB wire protocol adapter layer to a PostgreSQL backend and it relies heavily on the use of JSON functions.
I was really wanting to also allow for SQLite as a backend but not sure if we can do everything that can be done in Postgres. If anyone want to take a look and colaborate on that, here’s a link to the project:
I've been using the JSON1 extension on Android with the Android SQLite support library [0] for a couple of years now.
According to this article, `Prior to version 3.38.0, the JSON functions were an extension that would only be included in builds if the -DSQLITE_ENABLE_JSON1 compile-time option was included.` So from the looks of it it is now compiled in by default for about 1.5 years.
Does anyone know if Android's SQLite now includes JSON1 capabilities out of the box, and if so, since which Android version?
Probably expands the API too much, but one missed use case I just ran into would be a .import function for loading a json list of records or jsonl directly into a table from the command line. This can be roughly replicated through readfile and some juggling, but will not work from stdin (I think? I would love to pipe json directly into a table).
The other alternative I considered was dumping the entire json blob into a one row table, before having a second query extract out the fields.
Long story short: json is a modified text field, jsonb is a specialized binary format that can be loaded without the overhead of parsing. Only jsonb supports indexing.
This makes for super-fast search, and the best part is you don't have to choose what to index at insert time; you can always make more virtual columns when you need them.
(You can still also search non-indexed, raw json, although it may take a long time for large collections).
I love SQLite so much.