This is great! There has been far too much sophistry in C++ where the justification for some contortions are "performance", with no empirical data.
I am surprised about CTRE giving good results—I will admit I have thought of it more as a parlor trick than a viable engine. I will need to dig into that more. I also want to dig into the OpenMP & TBB threadpool benchmarks to see whether Boost::ASIO threadpool can be added into it.
I was also surprised about CTRE and I can imagine it being a viable tool for implementing parsers for 99% of use cases, where you may not need all the SIMD bells and whistles.
A word of caution, though: I remember the throughput differing vastly between GCC and MSVC builds. The latter struggles with heavy meta-programming and expression templates. I don't know why.
Oh MSVC, bless. Mingw setup might be a pain, but the dividends accrue over time.
--
This was a good reminder that I need to pay more attention to Unum's projects. I noticed this older blog article, https://www.unum.cloud/blog/2021-01-03-texts, and that brings up some questions. First, in 2025, is UStore a wholesale replacement for UDisk or are the two complementary? Second, what is the current Unum approach for replacing full-text search engines (e.g., ElasticSearch)?
For years, I've had a hope to build it in the form of an open-core project: open-source SotA solutions for Storage, Compute, and AI Modeling built bottom up. You can imagine the financial & time burden of building something like that with all the weird optimizations and coding practices listed above.
A few years in, with millions spent out of my pocket, without any venture support or revenue, I've decided to change gears and focus on a few niche workloads until some of the Unum tools become industry standards for something. USearch was precisely that, a toy Vector Search engine that would still, hopefully, be 10x better than alternatives, in one way or another: <https://www.unum.cloud/blog/2023-11-07-scaling-vector-search...>.
Now, ScyllaDB (through Rust SDK) and YugaByte (through C++ SDK) are the most recent DBMSs to announce features built on USearch, joining the ranks of many other tech products leveraging some of those optimizations, and I was playing around with different open-source growth & governance ideas last year, looking for way to organize more collaborative environment among our upstream users, rather than competitive — no major releases, just occasional patches here and there.
It’s kickass that USearch is in DuckDB. That’s something I will play with, for sure.
ElasticSearch has always seemed geared too much towards concurrent querying with mixed workloads, and then it gets applied to logs… and, well, with logs you care about detection of known query sets at ingest, indexing speed, compression, and ability to answer queries over large cold indices in cheap object storage. And often when searching logs, you want exact matching, preferably with regex. Part of me wants to play with rolling my own crazy FM-index library, part of me thinks it might be easier to play games with Parquet dictionary tables (get factors out of regexps, check dictionary tables for the factors for great win), and part of me thinks I will be better off waiting to see what comes of the Rust-based ES challengers.
Will definitely follow announcements to come with StringZilla.
std::regex is such a nightmare, I didn't take the time to run the code myself but I'd be curious if you would see the same delta if you swapped it for boost::regex or re2.
You can beat CTRE throughput with a non-compile-time solution. My first recommendation will be to look at HyperScan. It has remarkable throughput: https://github.com/intel/hyperscan
I’ve only avoided it in the tutorial, as I want to keep the build system lean. I wouldn’t be surprised if it’s 10x faster than Boost in the average case.
To echo Ash, HyperScan is the performance king. Use it when shipping on x64 and one has control over the pattern sets and you don't mind building it and working through its idiosyncrasies. It has some nonstandard matching semantics, reflecting its focus on network intrusion detection systems (NIDS), and does not have Unicode support.
For general purpose usage, Google's RE2 and PCRE2 in JIT mode will offer pretty good performance. Zoltan Herczeg's work on the PCRE2's JIT is underappreciated. Both these options are widely available and portable.
I am surprised about CTRE giving good results—I will admit I have thought of it more as a parlor trick than a viable engine. I will need to dig into that more. I also want to dig into the OpenMP & TBB threadpool benchmarks to see whether Boost::ASIO threadpool can be added into it.