Hacker News new | past | comments | ask | show | jobs | submit login

Have you considered writing your own string matcher for the simple cases like fixed patterns?

I got some pretty solid wins just by guarding some regex executions with simple strings.indexof calls.




Yeah, that's a good idea, I did consider it, but haven't tried it yet. Do you hook and look at the regex string before it's compiled, or do you hook in at the parsed regex AST level? (eg: regexp/syntax in Go).


For something like awk, I think you'd look before compiling, then create your own matcher. With an abstract Matcher interface that regexp implements.

It's C, but openbsd grep does something like this because libc regex is super slow. Look for fastcomp on https://github.com/openbsd/src/blob/master/usr.bin/grep/util... It's not super sophisticated, but enough to beat the full regex engine.

In the go code where I did this, it was a little different, with a static pattern. Something like "(\w+) apple" to find all apple adjectives or whatever, but the regexp wasted so much time matching words before not apples. A quick scan for "apple" to eliminate impossible matches made it faster. This depends more on knowing regex and corpus, so probably less relevant for awk.


Go's regexp package even exposes a routine for this: https://pkg.go.dev/regexp#Regexp.LiteralPrefix

It's been a while since I've looked at the source code, but it is almost certainly already doing basic prefix literal optimizations.

The more advanced literal optimizations come into play when every match must start with one of a few possible characters or strings. Then you get into things like Aho-Corasick or fancy SIMD algorithms (Teddy).


Oh, regarding rure go, the bugs note about using int is inaccurate. Go spec says max length of any object can be stored in int. You can't build a too big haystack.


Ah that's right! Nice catch, thanks.


I think GNU grep does something similar. When it has a fixed patter it uses Boyer-Moore [1].

[1]: https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: