In the last days I finally understood how to do virtualizables. Now the frame overhead is gone. This was done with the help of discussion with Samuele, porting ideas from PyPy's first JIT attempt.
This is of course work in progress, but it works in PyPy (modulo a few XXXs, but no bugs so far). The performance of the resulting code is quite good: even with Boehm (the GC that is easy to compile to but gives a slowish pypy-c), a long-running loop typically runs 50% faster than CPython. That's "baseline" speed, moreover: we will get better speed-ups by applying optimizations on the generated code. Doing so is in progress, but it suddenly became easier because that optimization phase no longer has to consider virtualizables -- they are now handled earlier.
Update:Virtualizables is basically a way to avoid frame overhead. The frame object is allocated and has a pointer, but the JIT is free to unpack it's fields (for example python level locals) and store them somewhere else (stack or registers). Each external (out of jit) access to frame managed by jit, needs to go via special accessors that can ask jit where those variables are.
As usual, progress is going slower then predicted, but nevertheless, we're working hard to make some progress.
We recently managed to make our nice GCs cooperate with our JIT. This is one point from our detailed plan. As of now, we have a JIT with GCs and no optimizations. It already speeds up some things, while slowing down others. The main reason for this is that the JIT generates assembler which is kind of ok, but it does not do the same level of optimizations gcc would do.
So the current status of the JIT is that it can produce assembler out of executed python code (or any interpreter written in RPython actually), but the results are not high quality enough since we're missing optimizations.
The current plan, as of now, looks as follows:
- Improve the handling of GCs in JIT with inlining of malloc-fast paths, that should speed up things by a constant, not too big factor.
- Write a simplified python interpreter, which will be a base for experiments and to make sure that our JIT does correct things with regard to optimizations. That would work as mid-level integration test.
- Think about ways to inline loop-less python functions into their parent's loop.
- Get rid of frame overhead (by virtualizables)
- Measure, write benchmarks, publish
Both of the papers that people from the PyPy team submitted to ICOOOLPS have been accepted. They are:
(the pdfs are obviously the submitted versions, not the final ones).
This year ICOOOLPS (Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems) is being held on July the 6th at ECOOP 2009 in Genova, Italy. Other than these two papers, Anto and Carl Friedrich will also present a PyPy tutorial, on July the 7th.
So, according to our jit plan we're mostly done with point 1, that is to provide a JIT that compiles python code to assembler in the most horrible manner possible but doesn't break. That meant mostly 4 weeks of glaring at GDB and megabytess of assembler generated by C code generated from python code. The figure of 4 weeks proves that our approach is by far superior to the one of psyco, since Armin says it's "only 4 weeks" :-)
Right now, pypy compiled with JIT can run the whole CPython test suite without crashing, which means we're done with obvious bugs and the only ones waiting for us are really horrible. (Or they really don't exist. At least they should never be about obscure Python corner cases: they can only be in the 10'000 lines of relatively clear code that is our JIT generator.)
But... the fun thing is that we can actually concentrate on optimizations! So the next step is to provide a JIT that is correct *and* actually speeds up python. Stay tuned for more :-)Cheers,
fijal, armin & benjamin
UPDATE: for those of you blessed with no knowledge of C, gdb stands for GNU debugger, a classic debugger for C. (It's also much more powerful than python debugger, pdb, which is kind of surprising).
First a disclaimer. This post is more about plans for future than current status. We usually try to write about things that we have done, because it's much much easier to promise things than to actually make it happen, but I think it's important enough to have some sort of roadmap.
In recent months we came to the point where the 5th generation of JIT prototype was working as nice or even a bit nicer than 1st one back in 2007. Someone might ask "so why did you spend all this time without going forward?". And indeed, we spend a lot of time moving sideways, but as posted, we also spent a lot of time doing some other things, which are important as well. The main advantage of current JIT incarnation is much much simpler than the first one. Even I can comprehend it, which is much of an improvement :-)
So, the prototype is working and gives very nice speedups in range of 20-30x over CPython. We're pretty confident this prototype will work and will produce fast python interpreter eventually. So we decided that now we'll work towards changing prototype into something stable and solid. This might sound easy, but in fact it's not. Having stable assembler backend and optimizations that keep semantics is not as easy as it might sound.
The current roadmap, as I see it, looks like as following:
- Provide a JIT that does not speedup things, but produce assembler without optimizations turned on, that is correct and able to run CPython's library tests on a nightly basis.
- Introduce simple optimizations, that should make above JIT a bit faster than CPython. With optimizations disabled JIT is producing incredibly dumb assembler, which is slower than correspoding C code, even with removal of interpretation overhead (which is not very surprising).
- Backport optimizations from JIT prototype, one by one, keeping an eye on how they perform and making sure they don't break anything.
- Create new optimizations, like speeding up attribute access.
This way, we can hopefully provide a working JIT, which gives fast python interpreter, which is a bit harder than just a nice prototype.
Tell us what you think about this plan.Cheers,
fijal & others.
The Leysin sprint is nearing its end, as usual here is an attempt at a summary
of what we did.
Large parts of the sprint were dedicated to fixing bugs. Since the easy bugs seem to have been fixed long ago, those were mostly very annoying and hard bugs. This work was supported by our buildbots, which we tried to get free of test-failures. This was worked on by nearly all participants of the sprint (Samuele, Armin, Anto, Niko, Anders, Christian, Carl Friedrich). One particularly annoying bug was the differences in the tracing events that PyPy produces (fixed by Anders, Samuele and Christian). Some details about larger tasks are in the sections below.
The work culminated in the beta released on Sunday.
A large number of problems came from our stackless features, which do some advanced things and thus seem to contain advanced bugs. Samuele and Carl Friedrich spent some time fixing tasklet pickling and unpickling. This was achieved by supporting the (un)pickling of builtin code objects. In addition they fixed some bugs in the finalization of tasklets. This needs some care because the __del__ of a tasklet cannot run at arbitrary points in time, but only at safe points. This problem was a bit subtle to get right, and popped up nearly every morning of the sprint in form of a test failure.
Armin and Niko added a way to restrict the stack depth of the RPython-level stack. This can useful when using stackless, because if this is not there it is possible that you fill your whole heap with stack frames in the case of an infinite recursion. Then they went on to make stackless not segfault when threads are used at the same time, or if a callback from C library code is in progress. Instead you get a RuntimeError now, which is not good but better than a segfault.
The LLVM backend had its own set of problems. For a long time it produced the fastest form of PyPy's Python interpreter, by first using the LLVM backend, applying the LLVM optimizations to the result, then using LLVM's C backend to produce C code, then apply GCC to the result :-). However, it is not clear that it is still useful to directly produce LLVM bitcode, since LLVM has rather good C frontends nowadays, with llvm-gcc and clang. It is likely that we will use LLVM in the future in our JIT (but that's another story, based on different code).
Therefore we decided to remove these two backends from SVN, which Samuele and Carl Friedrich did. They are not dead, only resting until somebody who is interested in maintaining them steps up.
One goal of the release is good Windows-support. Anders and Samuele set up a new windows buildbot which revealed a number of failures. Those were attacked by Anders, Samuele and Christian as well as by Amaury (who was not at the sprint, but thankfully did a lot of Windows work in the last months).
Christian with some help by Samuele tried to get translation working again under Mac OS X. This was a large mess, because of different behaviours of some POSIX functionality in Leopard. It is still possible to get the old behaviour back, but whether that was enabled or not depended on a number of factors such as which Python is used. Eventually they managed to successfully navigate that maze and produce something that almost works (there is still a problem remaining about OpenSSL).
The Friday of the sprint was declared to be a documentation day, where (nearly) no coding was allowed. This resulted in a newly structured and improved getting started document (done by Carl Friedrich, Samuele and some help of Niko) and a new document describing differences to CPython (Armin, Carl Friedrich) as well as various improvements to existing documents (everybody else). Armin undertook the Sisyphean task of listing all talks, paper and related stuff of the PyPy project.
Java Backend Work
Niko and Anto worked on the JVM backend for a while. First they had to fix translation of the Python interpreter to Java. Then they tried to improve the performance of the Python interpreter when translated to Java. Mostly they did a lot of profiling to find performance bottlenecks. They managed to improve performance by 40% by overriding fillInStackTrace of the generated exception classes. Apart from that they found no simple-to-fix performance problems.
Armin gave a presentation about the current state of the JIT to the sprinters as well as Adrian Kuhn, Toon Verwaest and Camillo Bruni of the University of Bern who came to visit for one day. There was a bit of work on the JIT going on too; Armin and Anto tried to get closer to having a working JIT on top of the CLI.
Today we are releasing a beta of the upcoming PyPy 1.1 release. There are some Windows and OS X issues left that we would like to address between now and the final release but apart from this things should be working. We would appreciate feedback.
The PyPy development team.
PyPy 1.1: Compatibility & Consolidation
Welcome to the PyPy 1.1 release - the first release after the end of EU funding. This release focuses on making PyPy's Python interpreter more compatible with CPython (currently CPython 2.5) and on making the interpreter more stable and bug-free.
PyPy's Getting Started lives at:
Highlights of This Release
More of CPython's standard library extension modules are supported, among them ctypes, sqlite3, csv, and many more. Most of these extension modules are fully supported under Windows as well.
Through a large number of tweaks, performance has been improved by 10%-50% since the 1.0 release. The Python interpreter is now between 0.8-2x (and in some corner case 3-4x) slower than CPython. A large part of these speed-ups come from our new generational garbage collectors.
Our Python interpreter now supports distutils as well as easy_install for pure-Python modules.
We have tested PyPy with a number of third-party libraries. PyPy can run now: Django, Pylons, BitTorrent, Twisted, SymPy, Pyglet, Nevow, Pinax:
https://morepypy.blogspot.com/2008/08/pypy-runs-unmodified-django-10-beta.html https://morepypy.blogspot.com/2008/07/pypys-python-runs-pinax-django.html https://morepypy.blogspot.com/2008/06/running-nevow-on-top-of-pypy.html
A buildbot was set up to run the various tests that PyPy is using nightly on Windows and Linux machines:
Sandboxing support: It is possible to translate the Python interpreter in a special way so that the result is fully sandboxed.
The clr module was greatly improved. This module is used to interface with .NET libraries when translating the Python interpreter to the CLI.
Stackless improvements: PyPy's stackless module is now more complete. We added channel preferences which change details of the scheduling semantics. In addition, the pickling of tasklets has been improved to work in more cases.
Classic classes are enabled by default now. In addition, they have been greatly optimized and debugged:
PyPy's Python interpreter can be translated to Java bytecode now to produce a pypy-jvm. At the moment there is no integration with Java libraries yet, so this is not really useful.
We added cross-compilation machinery to our translation toolchain to make it possible to cross-compile our Python interpreter to Nokia's Maemo platform:
Some effort was spent to make the Python interpreter more memory-efficient. This includes the implementation of a mark-compact GC which uses less memory than other GCs during collection. Additionally there were various optimizations that make Python objects smaller, e.g. class instances are often only 50% of the size of CPython.
The support for the trace hook in the Python interpreter was improved to be able to trace the execution of builtin functions and methods. With this, we implemented the _lsprof module, which is the core of the cProfile module.
A number of rarely used features of PyPy were removed since the previous release because they were unmaintained and/or buggy. Those are: The LLVM and the JS backends, the aspect-oriented programming features, the logic object space, the extension compiler and the first incarnation of the JIT generator. The new JIT generator is in active development, but not included in the release.
What is PyPy?
Technically, PyPy is both a Python interpreter implementation and an advanced compiler, or more precisely a framework for implementing dynamic languages and generating virtual machines for them.
The framework allows for alternative frontends and for alternative backends, currently C, Java and .NET. For our main target "C", we can "mix in" different garbage collectors and threading models, including micro-threads aka "Stackless". The inherent complexity that arises from this ambitious approach is mostly kept away from the Python interpreter implementation, our main frontend.
Socially, PyPy is a collaborative effort of many individuals working together in a distributed and sprint-driven way since 2003. PyPy would not have gotten as far as it has without the coding, feedback and general support from numerous people.
the PyPy release team, [in alphabetical order]
Amaury Forgeot d'Arc, Anders Hammerquist, Antonio Cuni, Armin Rigo, Carl Friedrich Bolz, Christian Tismer, Holger Krekel, Maciek Fijalkowski, Samuele Pedroni
and many others: https://codespeak.net/pypy/dist/pypy/doc/contributor.html
The Leysin Sprint started today. The weather is great and the view is wonderful, as usual. Technically we are working on the remaining test failures of the nightly test runs and are generally trying to fix various long-postponed bugs. I will try to give more detailed reports as the sprint progresses.