Skip to main content

Some benchmarking


Recently, thanks to the surprisingly helpful Unhelpful, also known as Andrew Mahone, we have a decent, if slightly arbitrary, set of performances graphs. It contains a couple of benchmarks already seen on this blog as well as some taken from The Great Computer Language Benchmarks Game. These benchmarks don't even try to represent "real applications" as they're mostly small algorithmic benchmarks. Interpreters used:

  1. PyPy trunk, revision 69331 with --translation-backendopt-storesink, which is now on by default
  2. Unladen swallow trunk, r900
  3. CPython 2.6.2 release

Here are the graphs; the benchmarks and the runner script are available

And zoomed in for all benchmarks except binary-trees and fannkuch.

As we can see, PyPy is generally somewhere between the same speed as CPython to 50x faster (f1int). The places where we're the same speed as CPython are places where we know we have problems - for example generators are not sped up by the JIT and they require some work (although not as much by far as generators & Psyco :-). The glaring inefficiency is in the regex-dna benchmark. This one clearly demonstrates that our regular expression engine is really, really, bad and urgently requires attention.

The cool thing here is, that although these benchmarks might not represent typical python applications, they're not uninteresting. They show that algorithmic code does not need to be far slower in Python than in C, so using PyPy one need not worry about algorithmic code being dramatically slow. As many readers would agree, that kills yet another usage of C in our lives :-)



Luis wrote on 2009-11-18 22:09:

Wow! This is getting really interesting. Congratulations!
By the way, it would be great if you include psyco in future graphs, so speed junkies can have a clearer picture of pypy's progress.

Eric Florenzano wrote on 2009-11-18 22:14:

Very interesting, congratulations on all the recent progress! It would be very interesting to see how PyPy stacks up against Unladen Swallow on Unladen Swallow's own performance benchmark tests, which do include a bit more real-world scenarios.

Maciej Fijalkowski wrote on 2009-11-18 22:31:

@Eric: yes, definitely, we're approaching that set of benchmarks

@Luis: yes, definitely, will try to update tomorrow, sorry.

Paddy3118 wrote on 2009-11-19 04:06:

It's good, but...

We are still in the realms of micro-benchmarks. It would be good to compare their performances when working on something larger. Django or Zope maybe?

Gaëtan de Menten wrote on 2009-11-19 07:52:

These last months, you seem to have had almost exponential progress. I guess all those years of research are finally paying off. Congratulations!

Also, another graph for memory pressure would be nice to have. Unladen Shadow is (was?) not very good in that area, and I wonder how PyPy compares.

[nitpick warning]
As a general rule, when mentioning trunk revisions, it's nice to also mention a date so that people know the test was fair. People assume it's from the day you did the tests, and confirming that would be nice.
[/nitpick warning]

Antoine wrote on 2009-11-19 09:45:

How about benchmarking against CPython trunk as well?



Tony Landis wrote on 2009-11-19 16:02:

What about memory consumption? That is almost as important to me as speed.

wilk wrote on 2009-11-19 16:04:

Congratulations !

Please could you remember us how to build and test pypy-jit ?

Anonymous wrote on 2009-11-19 23:38:

I'm curious why mandelbrot is much less accelerated than, say, nbody. Does PyPy not JIT complex numbers properly yet?

Benjamin Peterson wrote on 2009-11-20 03:03:

@wilk ./ -Ojit

Benjamin Peterson wrote on 2009-11-20 03:11:

@Anon Our array module is in pure Python and much less optimized than CPython's.

Leo wrote on 2009-11-20 07:11:

How long until I can do

pypy-c-jit -Ojit


So far, when I try, I get

NameError: global name 'W_NoneObject' is not defined

holger krekel wrote on 2009-11-20 07:37:

ASFAIU it's not PyPy's regex engine being "bad" but rather the fact that the JIT generator cannot consider and optimize the loop in the regex engine, as it is a nested loop (the outer one being the bytecode interpretation one).

Armin Rigo wrote on 2009-11-20 10:41:

@holger: yes, that explains why regexps are not faster in PyPy, but not why they are 5x or 10x slower. Of course our regexp engine is terribly bad. We should have at least a performance similar to CPython.

Anonymous wrote on 2009-11-20 15:35:

Benjamin, is it really an issue with array? The inner loop just does complex arithmetic. --Anon

Benjamin Peterson wrote on 2009-11-20 22:41:

@Anon I'm only guessing. Our math is awfully fast.

Antonio Cuni wrote on 2009-11-20 23:54:

@Anon, @Benjamin
I've just noticed that W_ComplexObject in objspace/std/ is not marked as _immutable_=True (as it is e.g. W_IntObject), so it is totally possible that the JIT is not able to optimize math with complexes as it does with ints and floats. We should look into it, it is probably easy to discover

vak wrote on 2009-11-20 23:58:

guys, sorry, who cares about *seconds*??

why didn't you normalize to the test winners? :)

Leo wrote on 2009-11-21 09:06:

So, um, has anyone managed to get JIT-ed pypy to compile itself?

When I tried to do this today, I got this:

Maciej Fijalkowski wrote on 2009-11-21 11:26:


yes, we know that bug. Armin is fixing it right now on faster-raise branch.

Armin Rigo wrote on 2009-11-21 17:47:

antonio: good point. On the second thought, though, it's not a *really* good point because we don't have _immutable_=True on floats either...

Leo wrote on 2009-11-21 19:35:

@Maciej Great! It'll be awesome to have a (hopefully much faster??) JITted build ... it currently takes my computer more than an hour ...

Benjamin Peterson wrote on 2009-11-22 01:45:

@Leo it's likely to take tons of memory, though.

Anonymous wrote on 2009-11-22 10:13:

Would perhaps also be nice to compare the performance with one the current Javascript-Engines(V8, SquirrelFish etc.)

Tom Clarke wrote on 2009-11-22 12:08:

Nice comparisons - and micro-performance looking good. Congratulations.

HOWEVER - there is no value in having three columns for each benchmark. The overall time is arbitrary, all that matters is relative so you might as well normalise all graphs to CPython = 1.0, for example. The relevant informtion is then easier to see!

Unknown wrote on 2009-11-23 19:24:

it's called "The Computer Language
Benchmarks Game" these days...

Luis wrote on 2009-11-23 21:10:

Tom is right, normalizing the graphs to cpython = 1.0 would make them much more readable.
Anyway, this is a very good Job from Unhelpful.

Anonymous wrote on 2009-11-27 13:54:

Do any of those benchmarks work with shedskin?

¬¬ wrote on 2009-11-30 07:26:

glad to see someone did something with my language shootout benchmark comment ;)

Anonymous wrote on 2009-12-01 19:07:

I checked but it doesn't have the data for unladen swallow. Where are the number?

Term Paper wrote on 2010-02-18 07:05:

I'm curious why mandelbrot is much less accelerated than, say, nbody. Does PyPy not JIT complex numbers properly yet?