This paper shows the first empirical results of eliminating the Global Interpreter Lock (GIL) in a scripting language through Hardware Transactional Memory (HTM) to improve the multi-thread performance of realistic programs.
We proposed a new automatic mechanism to dynamically adjust the transaction lengths on a per-yield-point basis.
Our mechanism chooses a near optimal tradeoff point between the relative overhead of the instructions to begin andend the transactions and the likelihood of transaction conflicts and footprint overflows. We experimented on the HTM facilities in the mainframe processor IBM zEC12 and the Intel 4th Generation Core processor (Xeon E3-1275 v3).
We evaluated the Ruby NAS Parallel Benchmarks (NPB),the WEBrick HTPP server, and Ruby on Rails. Our results show that HTM achieved up to a 4.4-fold speedup in the Ruby NPB, and 1.6-fold and 1.2-fold speedups in WEBrick and Ruby on Rails, respectively. The dynamic transaction-length adjustment chose the best transaction lengths. On Xeon E3-1275 v3, programs need to run long enough to benefit from the dynamic transaction-length adjustment. From all of these results, we concluded that HTM is an effective approach to achieve higher multi-thread performance compared to the GIL.
Our techniques will be effective also in Python, because our GIL elimination and dynamic transaction-length adjustment do not depend on Ruby. Conflict removal can be specific to each implementation. For example, the original Python implementation (CPython) uses reference counting GC, which will cause many conflicts, while PyPy uses copying GC and thus is more suitable for the GIL elimination through HTM.
---
Funny how I stumbled on this article just after finding out about Peach:
---
7. Conclusion and Future Work
This paper shows the first empirical results of eliminating the Global Interpreter Lock (GIL) in a scripting language through Hardware Transactional Memory (HTM) to improve the multi-thread performance of realistic programs. We proposed a new automatic mechanism to dynamically adjust the transaction lengths on a per-yield-point basis. Our mechanism chooses a near optimal tradeoff point between the relative overhead of the instructions to begin andend the transactions and the likelihood of transaction conflicts and footprint overflows. We experimented on the HTM facilities in the mainframe processor IBM zEC12 and the Intel 4th Generation Core processor (Xeon E3-1275 v3). We evaluated the Ruby NAS Parallel Benchmarks (NPB),the WEBrick HTPP server, and Ruby on Rails. Our results show that HTM achieved up to a 4.4-fold speedup in the Ruby NPB, and 1.6-fold and 1.2-fold speedups in WEBrick and Ruby on Rails, respectively. The dynamic transaction-length adjustment chose the best transaction lengths. On Xeon E3-1275 v3, programs need to run long enough to benefit from the dynamic transaction-length adjustment. From all of these results, we concluded that HTM is an effective approach to achieve higher multi-thread performance compared to the GIL. Our techniques will be effective also in Python, because our GIL elimination and dynamic transaction-length adjustment do not depend on Ruby. Conflict removal can be specific to each implementation. For example, the original Python implementation (CPython) uses reference counting GC, which will cause many conflicts, while PyPy uses copying GC and thus is more suitable for the GIL elimination through HTM.
---
Funny how I stumbled on this article just after finding out about Peach:
Parallel Each for JRuby
https://github.com/schleyfox/peach
https://news.ycombinator.com/edit?id=7802918