Hacker News new | past | comments | ask | show | jobs | submit login

According to my totally unscientific benchmarks, if the performance of Rust's regular expression module were 1x, Go's was 8x and Ruby's was 32x, and Java's was 42x. Using Google's Java regex module improved the speed quite a bit but still was at ~18x. I was very impressed to see Rust doing so well. And it was sad to see Java so underperforming in such a typical workload. I know we're measuring libraries not languages, but I think regexes are so prevalent that not optimizing for it would hinder the language's real life performance.



For the JVM benchmark, unless you used Pattern.compile() and let the VM warm up (or used the JMH benchmark framework), your numbers are likely wildly off.


It depends on the goal of the measurement. For example, if you wanted to write a grep in Java, would the total runtime of the program be faster or slower if you "waited for the VM to warm up?"


By performance you mean elapsed time, right? So by 8x you mean 8x slower?


Yeah that's right. I compiled a pattern and matched it against a huge wall of text, measuring elapsed time.


Interesting. Looking at this repo, they have

Rust -> Ruby -> Java -> Golang

https://github.com/mariomka/regex-benchmark

Though it appears the numbers are two years old or so, and only for 3 specific regexes.


convenient to show your benchmarks?


Not OP, but Benchmarks Game has a performance test based on regex: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

The top times for the mentioned languages are:

Rust: 0.78s

Ruby: 12.33s

Java: 5.34s

Go: 3.80s

Python: 1.34s


The Java and golang numbers are not apples to apples. The Java solution is using the standard library code, while golang's is calling out to native code (wrapper C library) as far as I can tell. Same with Python.


Other programs are shown:

    Python #2  1.34s PCRE2
    Go #5      3.80s PCRE
    Java #3    5.34s
    Java #6    5.39s
    Java #1    8.49s
    Python #1  9.25s "standard library"
    Go #4     15.60s PCRE
    Go #3     26.86s "standard library"
    Go #1     27.01s "standard library"


To be fair to non-Javas, the java version is using Java code because Java sucks at calling out to faster languages.


Oh yes, that Python code looks so idiomatic:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


Can't comment on that but why are these people putting spaces after commas (correct) and not to the left and right of the assignment sign = ? It's so weird to read a=b


    regex=PCRE2.pcre2_compile_8(pattern, c_size_t(len(pattern)), c_uint32(0),
   byref(c_int()), byref(c_size_t()), None)
As far as FFI goes, I would say this looks amazingly simple.


That’s very straightforward translation of C code using pcre to Python using ctypes. It doesn’t look great because the ugliness isn’t wrapped in a library. But Python ctypes definitely isn’t hard to use. It is unsafe as hell though.



Seems to be OK in pidigits — Java #3 program

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

    C gcc #6 0.56s
    Java  #3 0.79s


There's JNI that exists now. And Project Panama will be a big improvement.


That's interesting that Ruby is as slow as it is, the regex engine (Onigmo) is written in C. I wonder where the bottleneck is compared to the other languages.


Seems to be mainly a test of compiling regexes, rather than executing them. Which is a bit pointless because few applications would not pre-compile regexes ... and then it would actually be penalising languages that put more effort into optimisation and hence perform better in the real world at execution.


Why would you even say this? It is certainly not a test of compiling regexes. Go and profile any one of those programs. You'll see that the majority of time is spent executing the search, not the compilation.

Look at the runtimes being mentioned here. We're talking on the order of seconds. The benchmark itself calls for 15 pretty simple regexes to be compiled. Regex compilation of 1ms or longer would be considered slow I think. So that's 15ms. That's nearly nothing compared to the full runtime of the benchmark. Even if you assumed each regex took 10ms on average to compile, you're still only looking at 150ms total.

I say this as the author of Rust's regex engine and as someone who has spent a non-trivial time looking at and thinking about this particular benchmark.


Apologies, you're correct. I mistook the use of Pattern.compile chained with replaceAll to mean that it was compiling the regex every time it used it.


Unfortunately, the test code belongs to the company I work for so I cannot take it out. It was done to determine what language we should use for an internal tool. I hope to conduct another benchmark one day, publicly this time.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: