We've just merged branch which adds float support to x86 backend. This means that floating point operations are now super fast in PyPy's JIT. Let's have a look at example, provided by Alex Gaynor and stolen from Factor blog.
The original version of the benchmark, was definitely tuned for the performance needs of CPython.
For running this on PyPy, I changed to a bit simpler version of the program, and I'll explain a few changes that I did, which the reflect current limitations of PyPy's JIT. They're not very deep and they might be already gone while you're reading it:
- Usage of __slots__. This is a bit ridiculous, but we spend quite a bit of time to speed up normal instances of new-style classes which are very fast, yet ones with __slots__ are slower. To be fixed soon.
- Usage of reduce. This one is even more obscure, but reduce is not perceived as a thing producing loops in a program. Moving to a pure-Python version of reduce fixes the problem.
- Using x ** 2 vs x * x. In PyPy, reading a local variable is a no-op when JITted (the same as reading local variable in C). However multiplication is simpler operation that power operation.
I also included the original Java benchmark. Please note that original java version is similar to my modified one (not the one specifically tuned for CPython)The performance figures below (for n = 1 000 000), average of 10 runs:
- CPython 2.6: 7.56s
- CPython & psyco 2.6: 4.44s
- PyPy: 1.63s
- Java (JVM 1.6, client mode): 0.77s
and while JVM is much faster, it's very good that we can even compare :-)Cheers