As you can probably see, we're very good on some benchmarks and not that great on others. Some of the bad results come from the fact that while we did a lot of JIT-related work, other PyPy parts did not see that much love. Some of our algorithms on the builtin data types are inferior to those of CPython. This is going to be an ongoing focus for a while.
We want to first improve on the benchmarks for a couple of weeks before doing a release to gather further feedback.
PyPy has recently made some great speed and memory progress towards providing the most efficient Python interpreter out there. We also just announced our plans for the pypy-1.2 release. Much of this is driven by personal commitment, by individuals and companies investing time and money. Now we'd appreciate some feedback and help regarding getting money into the PyPy project to help its core members (between 5 and 15 people depending how you count) to sustain themselves. We see several options:
- use a foundation structure and ask for tax-exempt donations to the project, its developers and infrastructure. We just got a letter from the Software Freedom Conservancy that they view our application favourably so this option becomes practical hopefully soon.
- offer to implement certain features like a 64bit JIT-backend, Numpy for PyPy or a streamlined installation in exchange for money, contributed in small portions/donations. Do you imagine you or your company would sponsor PyPy on a small scale for efforts like this? Any other bits you'd like to see?
- offer to implement larger scale tasks by contracting PyPy related companies, namely Open End and merlinux who have successfully done such contracts in the past. Please don't hesitate to contact email@example.com and firstname.lastname@example.org if you want to start a conversation on this.
- apply for public/state funding - in fact we are likely to get some funding through Eurostars, more on that separately. Such funding is usually only a 50-60% percentage of actual employment and project costs, and is tied to research questions rather than to make PyPy a production-useable interpreter, though.
Anything else we should look out for?
cheers & thanks for any feedback, Maciej and Holger
The PyPy core team is planning to make a new release before the next Pycon US.
The main target of the 1.2 release is packaging the good results we have achieved applying our current JIT compiler generator to our Python interpreter. Some of that progress has been chronicled in recent posts on the status blog. By releasing them in a relatively stable prototype we want to encourage people to try them with their own code and to gather feedback in this way. By construction the JIT compiler should support all Python features, what may vary are the speedups achieved (in some cases the JIT may produce worse results than the PyPy interpreter which we would like to know) and the extra memory required by it.
For the 1.2 release we will focus on the JIT stability first, less on improving non-strictly JIT areas. The JIT should be good at many things as shown by previous blog postings. We want the JIT compiler in the release to work well on Intel 32 bits on Linux, with Mac OS X and Windows being secondary targets. Which compilation targets work will depend a bit on contributions.
In order to finalize the release we intend to have a concentrated effort ("virtual sprint") from the 22nd to the 29th of January. Coordination will happen as usual through the #pypy irc channel on freenode. Samuele Pedroni will take the role of release manager as he already did in the past.
Update: the sprint has been reported to some later date.The next PyPy sprint will probably still be in Leysin, Switzerland, for the seventh time.
If you have ever wanted to use CPython extension modules on PyPy, we want to announce that there is a solution that should be compatible to quite a bit of the available modules. It is neither new nor written by us, but works nevertheless great with PyPy.
The trick is to use RPyC, a transparent, symmetric remote procedure call library written in Python. The idea is to start a CPython process that hosts the PyQt libraries and connect to it via TCP to send RPC commands to it.
I tried to run PyQt applications using it on PyPy and could get quite a bit of the functionality of these working. Remaining problems include regular segfaults of CPython because of PyQt-induced memory corruption and bugs because classes like StandardButtons behave incorrectly when it comes to arithmetical operations.
Changes to RPyC needed to be done to support remote unbound __init__ methods, shallow call by value for list and dict types (PyQt4 methods want real lists and dicts as parameters), and callbacks to methods (all remote method objects are wrapped into small lambda functions to ease the call for PyQt4).
If you want to try RPyC to run the PyQt application of your choice, you just need to follow these steps. Please report your experience here in the blog comments or on our mailing list.
- Download RPyC from the RPyC download page.
- Download this patch and apply it to RPyC by running patch -p1 < rpyc-3.0.7-pyqt4-compat.patch in the RPyC directory.
- Install RPyc by running python setup.py install as root.
- Run the file rpyc/servers/classic_server.py using CPython.
- Execute your PyQt application on PyPy.
PyPy will automatically connect to CPython and use its PyQt libraries.
Note that this scheme works with nearly every extension library. Look at pypy/lib/sip.py on how to add new libraries (you need to create such a file for every proxied extension module).
Have fun with PyQt
Recently, thanks to the surprisingly helpful Unhelpful, also known as Andrew Mahone, we have a decent, if slightly arbitrary, set of performances graphs. It contains a couple of benchmarks already seen on this blog as well as some taken from The Great Computer Language Benchmarks Game. These benchmarks don't even try to represent "real applications" as they're mostly small algorithmic benchmarks. Interpreters used:
- PyPy trunk, revision 69331 with --translation-backendopt-storesink, which is now on by default
- Unladen swallow trunk, r900
- CPython 2.6.2 release
Here are the graphs; the benchmarks and the runner script are availableAnd zoomed in for all benchmarks except binary-trees and fannkuch.
As we can see, PyPy is generally somewhere between the same speed as CPython to 50x faster (f1int). The places where we're the same speed as CPython are places where we know we have problems - for example generators are not sped up by the JIT and they require some work (although not as much by far as generators & Psyco :-). The glaring inefficiency is in the regex-dna benchmark. This one clearly demonstrates that our regular expression engine is really, really, bad and urgently requires attention.
The cool thing here is, that although these benchmarks might not represent typical python applications, they're not uninteresting. They show that algorithmic code does not need to be far slower in Python than in C, so using PyPy one need not worry about algorithmic code being dramatically slow. As many readers would agree, that kills yet another usage of C in our lives :-)Cheers,
While the Düsseldorf is dwindling off, we put our minds to the task of retelling our accomplishments. The sprint was mostly about improving the JIT and we managed to stick to that task (as much as we managed to stick to anything). The sprint was mostly filled with doing many small things.
Carl Friedrich and Samuele started the sprint trying to tame the JIT's inlining. Until now, the JIT would try to inline everything in a loop (except other loops) which is what most tracing JITs actually do. This works great if the resulting trace is of reasonable length, but if not it would result in excessive memory consumption and code cache problems in the CPU. So far we just had a limit on the trace size, and we would abort tracing when the limit was reached. This would happen again and again for the same loop, which is not useful at all. The new approach introduced is to be more clever when tracing is aborted by marking the function with the largest contribution to the trace size as non-inlinable. The next time this loop is traced, it usually then gives a reasonably sized trace.
This gives a problem because now some functions that don't contain loops are not inlined, which means they never get assembler code for them generated. To remedy this problem we also make it possible to trace functions from their start (as opposed to just tracing loops). We do that only for functions that can not be inlinined (either because they contain loops or they were marked as non-inlinable as described above).
The result of this is that the Python version telco decimal benchmark runs to completion without having to arbitrarily increase the trace length limit. It's also about 40% faster than running it on CPython. This is one of the first non-tiny programs that we speed up.
Reducing GC Pressure
Armin and Anto used some GC instrumentation to find places in pypy-c-jit that allocate a lot of memory. This is an endlessly surprising exercise, as usually we don't care too much about allocations of short-lived objects when writing RPython, as our GCs usually deal well with those. They found a few places where they could remove allocations, most importantly by making one of the classes that make up traces smaller.
Optimizing Chains of Guards
Carl Friedrich and Samuele started a simple optimization on the trace level that removes superfluous guards. A common pattern in a trace is to have stronger and stronger guards about the same object. As an example, often there is first a guard that an object is not None, later followed by a guard that it is exactly of a given class and then even later that it is a precise instance of that class. This is inefficient, as we can just check the most precise thing in the place of the first guard, saving us guards (which take memory, as they need resume data). Maciek, Armin and Anto later improved on that by introducing a new guard that checks for non-nullity and a specific class in one guard, which allows us to collapse more chains.
Improving JIT and Exceptions
Armin and Maciek went on a multi-day quest to make the JIT and Python-level exceptions like each other more. So far, raising and catching exceptions would make the JIT generate code that has a certain amusement value, but is not really fast in any way. To improve the situation, they had to dig into the exception support in the Python interpreter, where they found various inefficiencies. They also had to rewrite the exceptions module to be in RPython (as opposed to just pure Python + an old hack). Another problems is that tracebacks give you access to interpreter frames. This forces the JIT to deoptimize things, as the JIT keeps some of the frame's content in CPU registers or on the CPU stack, which reflective access to frames prevents. Currently we try to improve the simple cases where the traceback is never actually accessed. This work is not completely finished, but some cases are already significantly faster.
Moving PyPy to use py.test 1.1
Holger worked on porting PyPy to use the newly released py.test 1.1. PyPy still uses some very old support code in its testing infrastructure, which makes this task a bit annoying. He also gave the other PyPy developers a demo of some of the newer py.test features and we discussed which of them we want to start using to improve our tests to make them shorter and clearer. One of the things we want to do eventually is to have less skipped tests than now.
Using a Simple Effect Analysis for the JIT
One of the optimization the JIT does is caching fields that are read out of structures on the heap. This cache needs to be invalidated at some points, for example when such a field is written to (as we don't track aliasing much). Another case is a call in the assembler, as the target function could arbitrarily change the heap. This of course is imprecise, since most functions don't actually change the whole heap, and we have an analysis that finds out which sorts of types of structs and arrays a function can mutate. During the sprint Carl Friedrich and Samuele integrated this analysis with the JIT, to help it invalidate caches less aggressively. Later Anto and Carl Friedrich also ported this support to the CLI version of the JIT.
Samuele (with some assistance of Carl Friedrich) set up a buildbot slave on a Mac Mini at the University. This should let us stabilize on the Max OS X. So far we still have a number of failing tests, but now we are in a situation to sanely approach fixing them.
Anto improved the CLI backend to support the infrastructure for producing the profiling graphs Armin introduced.
The guinea-pigs that were put into Carl Friedrich's care have been fed (which was the most important sprint task anyway).
Samuele & Carl Friedrich
The Düsseldorf sprint starts today. Only Samuele and me are there so far, but that should change over the course of the day. We will mostly work on the JIT during this sprint, trying to make it a lot more practical. For that we need to decrease its memory requirements some more and to make it use less aggressive inlining. We will post more as the sprint progresses.
It's maybe a bit late to announce, but there will be PyPy talk at Rupy conference this weekend in Poznan. Precisely, I'll be talking mostly about PyPy's JIT and how to use it. Unfortunately the talk is on Saturday, at 8:30 in the morning.
EDIT: Talk is online, together with examplesCheers,
This week I worked on improving the system we use for logging. Well, it was not really a "system" but rather a pile of hacks to measure in custom ways timings and counts and display them. So now, we have a system :-)
The system in question was integrated in the code for the GC and the JIT, which are two independent components as far as the source is concerned. However, we can now display a unified view. Here is for example pypy-c-jit running pystone for (only) 5000 iterations:
The top long bar represents time. The bottom shows two summaries of the total time taken by the various components, and also plays the role of a legend to understand the colors at the top. Shades of red are the GC, shades of green are the JIT.
Here is another picture, this time on pypy-c-jit running 10 iterations of richards:
We have to look more closely at various examples, but a few things immediately show up. One thing is that the GC is put under large pressure by the jit-tracing, jit-optimize and (to a lesser extent) the jit-backend components. So large in fact that the GC takes at least 60-70% of the time there. We will have to do something about it at some point. The other thing is that on richards (and it's likely generally the case), the jit-blackhole component takes a lot of time. "Blackholing" is the operation of recovering from a guard failure in the generated assembler, and falling back to the interpreter. So this is also something we will need to improve.
That's it! The images were generated with the following commands:
PYPYLOG=/tmp/log pypy-c-jit richards.py python pypy/tool/logparser.py draw-time /tmp/log --mainwidth=8000 --output=filename.pngEDIT: nowadays the command-line has changed to:
python rpython/tool/logparser.py draw-time /tmp/log --mainwidth=8000 filename.png