You can beat CTRE throughput with a non-compile-time solution. My first recommendation will be to look at HyperScan. It has remarkable throughput: https://github.com/intel/hyperscan
I’ve only avoided it in the tutorial, as I want to keep the build system lean. I wouldn’t be surprised if it’s 10x faster than Boost in the average case.
To echo Ash, HyperScan is the performance king. Use it when shipping on x64 and one has control over the pattern sets and you don't mind building it and working through its idiosyncrasies. It has some nonstandard matching semantics, reflecting its focus on network intrusion detection systems (NIDS), and does not have Unicode support.
For general purpose usage, Google's RE2 and PCRE2 in JIT mode will offer pretty good performance. Zoltan Herczeg's work on the PCRE2's JIT is underappreciated. Both these options are widely available and portable.
I’ve only avoided it in the tutorial, as I want to keep the build system lean. I wouldn’t be surprised if it’s 10x faster than Boost in the average case.