Hacker News new | past | comments | ask | show | jobs | submit login

Not OP, but Benchmarks Game has a performance test based on regex: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

The top times for the mentioned languages are:

Rust: 0.78s

Ruby: 12.33s

Java: 5.34s

Go: 3.80s

Python: 1.34s




The Java and golang numbers are not apples to apples. The Java solution is using the standard library code, while golang's is calling out to native code (wrapper C library) as far as I can tell. Same with Python.


Other programs are shown:

    Python #2  1.34s PCRE2
    Go #5      3.80s PCRE
    Java #3    5.34s
    Java #6    5.39s
    Java #1    8.49s
    Python #1  9.25s "standard library"
    Go #4     15.60s PCRE
    Go #3     26.86s "standard library"
    Go #1     27.01s "standard library"


To be fair to non-Javas, the java version is using Java code because Java sucks at calling out to faster languages.


Oh yes, that Python code looks so idiomatic:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


Can't comment on that but why are these people putting spaces after commas (correct) and not to the left and right of the assignment sign = ? It's so weird to read a=b


    regex=PCRE2.pcre2_compile_8(pattern, c_size_t(len(pattern)), c_uint32(0),
   byref(c_int()), byref(c_size_t()), None)
As far as FFI goes, I would say this looks amazingly simple.


That’s very straightforward translation of C code using pcre to Python using ctypes. It doesn’t look great because the ugliness isn’t wrapped in a library. But Python ctypes definitely isn’t hard to use. It is unsafe as hell though.



Seems to be OK in pidigits — Java #3 program

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

    C gcc #6 0.56s
    Java  #3 0.79s


There's JNI that exists now. And Project Panama will be a big improvement.


That's interesting that Ruby is as slow as it is, the regex engine (Onigmo) is written in C. I wonder where the bottleneck is compared to the other languages.


Seems to be mainly a test of compiling regexes, rather than executing them. Which is a bit pointless because few applications would not pre-compile regexes ... and then it would actually be penalising languages that put more effort into optimisation and hence perform better in the real world at execution.


Why would you even say this? It is certainly not a test of compiling regexes. Go and profile any one of those programs. You'll see that the majority of time is spent executing the search, not the compilation.

Look at the runtimes being mentioned here. We're talking on the order of seconds. The benchmark itself calls for 15 pretty simple regexes to be compiled. Regex compilation of 1ms or longer would be considered slow I think. So that's 15ms. That's nearly nothing compared to the full runtime of the benchmark. Even if you assumed each regex took 10ms on average to compile, you're still only looking at 150ms total.

I say this as the author of Rust's regex engine and as someone who has spent a non-trivial time looking at and thinking about this particular benchmark.


Apologies, you're correct. I mistook the use of Pattern.compile chained with replaceAll to mean that it was compiling the regex every time it used it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: