If you translate a pypy-c with
--allworkingmodules and start it, you will probably not notice anything strange about its prompt - except when typing multiline statements. You can move the cursor up and continue editing previous lines. And the history is multiline-statements-aware as well. Great experience! Ah, and completion using
tab is nice too.
Truth be told, there is nothing new here: it was all done by Michael Hudson's pyrepl many years ago. We had already included pyrepl in PyPy some time ago. What is new is a pure Python
readline.py which exposes the most important parts of the API of the standard
readline module by wrapping pyrepl under the hood, without needing the GNU readline library at all. The PyPy prompt is based on this, benefitting automagically from pyrepl's multiline editing capabilities, with minor tweaks so that the prompt looks much more like CPython's than a regular pyrepl prompt does.
You can also try and use this multiline prompt with CPython: check out pyrepl at
https://codespeak.net/svn/pyrepl/trunk/pyrepl and run the new
While discussing what to post as an April Fool's joke yesterday, we had a couple of other ideas, listed below. Most of them were rejected because they are too incredible, others because they are too close to our wish list.
- quantum computer backend
- Perl6 interpreter in RPython
- Ruby backend to allow run "python on rails"
- mandatory static typing at app-level, because it's the only way to increase performances
- rewrite PyPy in Haskell, because we discovered that dynamic typing is just not suitable for a project of this size
- a C front-end, so that we can interpret the C source of Python C extensions and JIT it. This would work by writing an interpreter for LLVM bytecode in RPython.
- an elisp backend
- a TeX backend (use PyPy for your advanced typesetting needs)
- an SQL JIT backend, pushing remote procedures into the DB engine
As you surely know, Python 3.0 is coming; recently, they released Python 3.0 alpha 3, and the final version is expected around September.
As suggested by the migration guide (in the PEP 3000), we started by applying 2to3 to our standard interpreter, which is written in RPython (though we should call it RPython 2.4 now, as opposed to RPython 3.0 -- see below).
Converting was not seamless, but most of the resulting bugs were due to the new dict views, str/unicode changes and the missing "reduce" built-in. After forking and refactoring both our interpreter and the 2to3 script, the Python interpreter runs on Python 3.0 alpha 3!
Next step was to run 2to3 over the whole translation toolchain, i.e. the part of PyPy which takes care of analyzing the interpreter in order to produce efficient executables; after the good results we got with the standard interpreter, we were confident that it would have been relatively easy to run 2to3 over it: unfortunately, it was not :-(.
After letting 2to3 run for days and days uninterrupted, we decided to kill it: we assume that the toolchain is simply too complex to be converted in a reasonable amount of time.
So, we needed to think something else; THE great idea we had was to turn everything upside-down: if we can't port PyPy to Py3k, we can always port Py3k to PyPy!
Under the hood, the 2to3 conversion tool operates as a graph transformer: it takes the graph of your program (in the form of Python 2.x source file) and returns a transformed graph of the same program (in the form of Python 3.0 source file). Since the entire translation toolchain of PyPy is based on graph transformations, we could reuse it to modify the behaviour of the 2to3 tool. We wrote a general graph-inverter algorithm which, as the name suggests, takes a graph transformation and build the inverse transformation; then, we applied the graph inverter to 2to3, getting something that we called 3to2: it is important to underline that 3to2 was built by automatically analysing 2to3 and reversing its operation with only the help of a few manual hints. For this reason and because we are not keeping generated files under version control, we do not need to maintain this new tool in the Subversion repository.
Once we built 3to2, it was relatively easy to pipe its result to our interpreter, getting something that can run Python 3.0 programs.
Performance-wise, this approach has the problem of being slower at import time, because it needs to run (automatically) 3to2 every time the source is modified; in the future, we plan to apply our JIT techniques also to this part of the interpreter, trying to mitigate the slowdown until it is not noticeable anymore to the final user.
In the next weeks, we will work on the transformation (and probably publish the technique as a research paper, with a title like "Automatic Program Reversion on Intermediate Languages").
UPDATE: In case anybody didn't guess or didn't spot the acronym: The above was an April Fool's joke. Nearly nothing of it is true.
This is mostly a bugfix release, with a couple of new features sneaked in. Most important changes:
- some new functionality (authentication, export, locking) in py.path's Subversion APIs
- numerous small fixes in py.test's rsession (experimental pluggable session) and generative test features
- some fixes in the py.test core
UPDATE: the py-lib is now easy-installable with:
As in the last years, PyPy will again participate in Google's Summer of Code program under the umbrella of the Python Software Foundation. Unfortunately we were a bit disorganized this year, so that our project ideas are only put up now. The list of project ideas of PyPy can be found here.
Any interested student should mail to our mailing list or just come to the #pypy channel on irc.freenode.net to discuss things.
As a part of implementing ctypes, we decided to make coding using ctypes better on its own (irrelevant what python interpreter you use). The concrete problem we're trying to solve is to make ctypes code more platform-independent than it is. Say you want to create a ctypes type for size_t: ctypes itself provides no mechanism for doing that, so you need to use a concrete integer type (c_int, c_long, c_short etc.). Your code either becomes platform dependent if you pick one of them or is littered with conditionals for all sorts of platforms. We created a small library, called ctypes_configure (which is actually a variation of something we use somewhere in the PyPy source tree), which tries to solve some platform dependencies by compiling and running small chunks of C code through a C compiler. It's sort of like configure in the Linux world, except for Python using ctypes.
To install the library, you can just type easy_install ctypes_configure. The code is in an svn repository on codespeak and there is even some documentation and sample code. Also, even though the code lives in the pypy repository, it depends only on pylib, not on the whole of pypy.
The library is in its early infancy (but we think it is already rather useful). In the future we could add extra features, it might be possible to check whether the argtypes that are attached to the external functions are consistent with what is in the C headers), so that the following code wouldn't segfault but give a nice error
libc = ctypes.CDLL("libc.so") time = libc.time time.argtypes = [ctypes.c_double, ctypes.c_double] time(0.0, 0.0)Also, we plan to add a way to install a package that uses ctypes_configure in such a way that the installed library doesn't need to call the C compiler any more later.
Bittorrent now runs on PyPy! I tried the no-GUI BitTornado version (btdownloadheadless.py). It behaves correctly and I fixed the last few obvious places which made noticeable pauses. (However we know that there are I/O performance issues left: we make too many internal copies of the data, e.g. in a
We are interested in people trying out other real-world applications that, like the GUI-less Bittorrent, don't have many external dependencies to C extension modules. Please report all the issues to us!
The current magic command line for creating a
pypy-c executable with as many of CPython's modules as possible is:
cd pypy/translator/goal ./translate.py --thread targetpypystandalone.py --allworkingmodules --withmod-_rawffi --faassen
(This gives you a thread-aware
pypy-c, which requires the Boehm gc library. The _rawffi module gives you ctypes support but is only tested for Linux at the moment.)
Good news everyone. A tuned PyPy compiled to C is nowadays as fast as CPython on the richards benchmark and slightly faster on the gcbench benchmark.
IMPORTANT: These are very carefully taken benchmarks where we expect pypy to be fast! PyPy is still quite slower than CPython on other benchmarks and on real-world applications (but we're working on it). The point of this post is just that for the first time (not counting JIT experiments) we are faster than CPython on *one* example :-)
The exact times as measured on my notebook (which is a Core Duo machine) are here:
Compiled pypy with options:
./translate.py --gcrootfinder=asmgcc --gc=generation targetpypystandalone.py --allworkingmodules --withmod-_rawffi --faassen (allworkingmodules and withmod-_rawffi are very likely irrelevant to those benchmarks)
CPython version 2.5.1, release.
- richards 800ms pypy-c vs 809ms cpython (1% difference)
- gcbench 53700ms pypy-c vs 60215ms cpython (11% difference)
About richards, there is a catch. We use a method cache optimization, and have an optimization which helps to avoid creating bound methods each time a method is called. This speeds up the benchmark for about 20%. Although method cache was even implemented for CPython, it didn't make its way to the core because some C modules directly modify the dictionary of new-style classes. In PyPy, the greater level of abstraction means that this operation is just illegal.
As part of our efforts of making PyPy's Python interpreter usable we put quite some effort into interfacing with external libraries. We were able, in quite a short amount of time (I think beginning really from Leysin sprint, or slightly earlier) to provide a prototype of the ctypes library. It is written in completely normal Python, at applevel, based on a very thin wrapper around the libffi library. This makes development a lot easier, but it makes the resulting ctypes implementation rather slow. The implementation is not complete yet and it will still need quite some effort to make it feature-complete (ctypes has lots of details and special cases and
do-what-I-mean magic). Yet another point will be to make it faster, but that's for much later.
The implementation is good enough to run those parts of Pyglet that don't depend on PIL (which PyPy doesn't have). Here are a few pictures of running Pyglet demos on top of compiled pypy-c. To compile a version of PyPy that supports ctypes, use this highly sophisticated command line
./translate.py --gc=generation ./targetpypystandalone.py --allworkingmodules --withmod-_rawffi
Note: this works on linux only right now.
The list of missing small ctypes features is quite extensive, but I consider the current implementation to be usable for most common cases. I would love to hear about libraries written in pure python (using ctypes), to run them on top of PyPy and use them as test cases. If someone knows such library, please provide a link.
Continuing the last blog post about GC semantics in Python.
Another consequence of reference counting is that resurrection is easy to detect. A dead object can resurrect itself if its finalizer stores it into a globally reachable position, like this:
class C(object): def __init__(self, num): self.num = num def __del__(self): global c if c is None: c = self c = C(1) while c is not None: c = None print "again"
This is an infinite loop in CPython: Every time c is set to None in the loop, the __del__ method resets it to the C instance again (note that this is terribly bad programming style, of course. In case anybody was wondering :-)). CPython can detect resurrection by checking whether the reference count after the call to __del__ has gotten bigger.
There exist even worse examples of perpetual resurrection in particular in combination with the cycle GC. If you want to see a particularly horrible one, see this discussion started by Armin Rigo. In the ensuing thread Tim Peters proposes to follow Java's example and call the finalizer of every object at most once.
In PyPy the resurrection problem is slightly more complex, since we have GCs that run collection from time to time and don't really get to know at which precise time an object dies. If the GC discovers during a collection that an object is dead, it will call the finalizer after the collection is finished. If the object is then dead at the next collection, the GC does not know whether the object was resurrected by the finalizer and then died in the meantime or whether it was not resurrected. Therefore it seemed sanest to follow Tim's solution and to never call the finalizer of an object a second time, which has many other benefits as well.