Oh MSVC, bless. Mingw setup might be a pain, but the dividends accrue over time....

ashvardanian · 2025-04-18T15:58:36 1744991916

I wish I'd had a short answer :)

For years, I've had a hope to build it in the form of an open-core project: open-source SotA solutions for Storage, Compute, and AI Modeling built bottom up. You can imagine the financial & time burden of building something like that with all the weird optimizations and coding practices listed above.

A few years in, with millions spent out of my pocket, without any venture support or revenue, I've decided to change gears and focus on a few niche workloads until some of the Unum tools become industry standards for something. USearch was precisely that, a toy Vector Search engine that would still, hopefully, be 10x better than alternatives, in one way or another: <https://www.unum.cloud/blog/2023-11-07-scaling-vector-search...>.

Now, ScyllaDB (through Rust SDK) and YugaByte (through C++ SDK) are the most recent DBMSs to announce features built on USearch, joining the ranks of many other tech products leveraging some of those optimizations, and I was playing around with different open-source growth & governance ideas last year, looking for way to organize more collaborative environment among our upstream users, rather than competitive — no major releases, just occasional patches here and there.

It was an interesting period, but now I'm again deep in the "CUDA-GDB" land, and the next major release to come is precisely around Full-Text Search in StringZilla <https://github.com/ashvardanian/stringzilla>, and will be integrated into both USearch <https://github.com/unum-cloud/usearch> and somewhere else ;)

jonstewart · 2025-04-18T16:18:24 1744993104

It’s kickass that USearch is in DuckDB. That’s something I will play with, for sure.

ElasticSearch has always seemed geared too much towards concurrent querying with mixed workloads, and then it gets applied to logs… and, well, with logs you care about detection of known query sets at ingest, indexing speed, compression, and ability to answer queries over large cold indices in cheap object storage. And often when searching logs, you want exact matching, preferably with regex. Part of me wants to play with rolling my own crazy FM-index library, part of me thinks it might be easier to play games with Parquet dictionary tables (get factors out of regexps, check dictionary tables for the factors for great win), and part of me thinks I will be better off waiting to see what comes of the Rust-based ES challengers.

Will definitely follow announcements to come with StringZilla.

ashvardanian · 2025-04-18T16:24:20 1744993460

> Part of me wants to play with rolling my own crazy FM-index library

Oh, absolutely, go for it! And check out what Pesho <https://github.com/pesho-ivanov> & Ragnar are doing <https://github.com/RagnarGrootKoerkamp> ;)