Python's garbage collection semantics is very much historically grown and implementation-driven. Samuele Pedroni therefore likes to call it the "'there is no such thing as too much chocolate'-approach to GC semantics" :-). In this two-part post series I am going to talk about the semantics of finalization (__del__ methods) in CPython and PyPy.
The current behaviour is mostly all a consequence of the fact that CPython uses reference counting for garbage collection. The first consequence is that if several objects die at the same time, their finalizers are called in a so-called topological order, which is a feature that some GCs have that CPython offers by chance. This ensures, that in a __del__ method, all the attributes of the object didn't get their __del__ called yet. A simple example:
class B(object): def __init__(self, logfile): self.logfile = logfile def __del__(self): self.logfile.write("done doing stuff") b = B(file("logfile.txt", "w"))
If the instance of B dies now, both it and the logfile are dead. They will get their __del__``s called and it's important that the file's ``__del__ gets called second, because otherwise the __del__ of B would try to write to a closed file.
The correct ordering happens completely automatically if you use reference counting: Setting b to None will decref the old value of b. This reduces the reference count of this instance to 0, so the finalizer will be called. After the __del__ has finished, this object will be freed and all the objects it points to decrefed as well, which decreases the reference count of the file to 0 and call its `` __del__`` as well, which closes the file.
The behaviour of PyPy's semispace and generational GCs wasn't very nice so far: it just called the finalizers in an essentially random order. Last week Armin came up with a somewhat complicated algorithm that solves this by emulating CPython's finalization order, which we subsequently implemented. So PyPy does what you expect now! The Boehm GC does a topological ordering by default, so it wasn't a problem there.
A small twist on the above is when there is a cycle of objects involving finalizers: In this case a topological ordering is not possible, so that CPython refuses to guess the finalization order and puts such cycles into gc.garbage. This would be very hard for PyPy to do, since our GC implementation is essentially independent from the Python interpreter. The same GCs work for our other interpreters after all too. Therefore we decided to break such a cycle at an arbitrary place, which doesn't sound too insane. The insane thing is for a Python program to create a cycle of objects with finalizers and depend on the order in which the finalizers are called. Don't do that :-) (After all, CPython wouldn't even call the finalizers in this case.)
Hello! I will have the pleasure of presenting PyPy on various conferences in the near future. They're (in chronological order):
- Studencki Festiwal Informatyczny in Krakow, POLAND 6-8 March 2008. I think this might be only interesting for polish people (website, in polish)
- Pycon Chicago, IL, USA. 14-17 March 2008. There should be also a PyPy sprint afterwards, including newbie-friendly tutorial, everybody is welcome to join us! (Provided that I'll get the US visa, which seems to be non-trivial issue for a polish citizen)
- RuPy, Poznan, POLAND 13-14 April 2008 (website). This is small, but very friendly Ruby and Python conference. Last year was amazing, I can strongly recommend to go there (Poznan is only 2h by train from Berlin also has its own airport).
In the last days we improved platform-support of PyPy's Python interpreter. Jean-Paul Calderone has been tirelessly working for some time now on setting up a buildbot for translating and testing PyPy. So far the basic mechanisms are working and the buildbot is running on various machines, including some that Michael Schneider (bigdog) lets us use, one of them being a Windows machine, the other one with a 64bit Linux (lots of thanks to those two, you are awesome!).
What is still missing is a nice way to visualize the test results to quickly see which tests have started failing on which platforms. There is a prototype already, which still needs some tweaking.
The availability of these machines has triggered some much-needed bug-fixing in PyPy to make our Python interpreter work better on Windows and on 64 bit Linux. Maciek and Michael Schneider worked on this quite a bit last week, with the result that PyPy supports many more extension modules now on Windows and 64 bit Linux. Since we now have the buildbot the hope is that the support also won't disappear soon :-).
So now the excuse "I can't contribute to PyPy because it needs all those special PyPy-keys" isn't working anymore :-).
(yes, C as in language, not c as in speed of light). I looked recently at the great computer language shootout, for some benchmarks and to make some speed comparisons. I use this benchmark, modified it to be rpythonic-enough and compared speeds. The code is here (the only change from the Python version was to create a class instead of tuple, so actually this version is more OO). Also the benchmark is very likely flawed because it favours better GCs :).
So, here we go:
|Language:||Time of run (for N=14):|
|Python version running on Python 2.5.1, distribution||25.5s|
|Python version running on PyPy with generational GC||45.5|
|Python with psyco||20s|
|RPython translated to C using PyPy's generational GC||0.42s|
|compiling the Haskell version with GHC 6.6.1||1.6s|
|compiling the C version with gcc 4.1.2 -O3 -fomit-frame-pointer||0.6s|
Also worth noticing is that when using psyco with the original version (with tuples) it is very fast (2s).
So, PyPy's Python interpreter is 80% slower than CPython on this (not too horrible), but RPython is 40% faster than gcc here. Cool. The result is mostly due to our GC, which also proves that manual memory-management can be slower than garbage collection in some situations. Please note that this result does not mean that RPython is meant for you. It requires a completely different mindset than the one used to program in Python. Don't say you weren't warned! :-)
After having spent the last few days on understanding PyPy's JIT, today I went back hacking the clr module. As a result, it is now possible to import and use external assemblies from pypy-cli, including Windows Forms
Here is a screenshot of the result you get by typing the following at the pypy-cli interactive prompt:
>>>> import clr >>>> clr.AddReferenceByPartialName("System.Windows.Forms") >>>> clr.AddReferenceByPartialName("System.Drawing") >>>> from System.Windows.Forms import Application, Form, Label >>>> from System.Drawing import Point >>>> >>>> frm = Form() >>>> frm.Text = "The first pypy-cli Windows Forms app ever" >>>> lbl = Label() >>>> lbl.Text = "Hello World!" >>>> lbl.AutoSize = True >>>> lbl.Location = Point(100, 100) >>>> frm.Controls.Add(lbl) >>>> Application.Run(frm)
Unfortunately at the moment you can't do much more than this, because we still miss support for delegates and so it's not possibile to handle events. Still, it's a step in the right direction :-).
A while ago Amit Regmi, a student from Canada, started working on the clr module improvements branch as a university project.
During the sprint Carl Friedrich, Paul and me worked more on it and brought it to a mergeable state.
- full support to generic classes;
- a new importer hook, allowing things like from System import Math and so on;
- .NET classes that implements IEnumerator are treated as Python iterators; e.g. it's is possile to iterate over them with a for loop.
This is an example of a pypy-cli session:
>>>> from System import Math >>>> Math.Abs(-42) 42 >>>> from System.Collections.Generic import List >>>> mylist = List[int]() >>>> mylist.Add(42) >>>> mylist.Add(43) >>>> mylist.Add("foo") Traceback (most recent call last): File "<console>", line 1, in <interactive> TypeError: No overloads for Add could match >>>> mylist 42 >>>> for item in mylist: print item 42 43
This is still to be considered an alpha version; there are few known bugs and probably a lot of unknown ones :-), so don't expect it to work in every occasion. Still, it's a considerable step towards real world :-).
Over the years PyPy has (ab?)used various external software for different purposes, and we've discovered bugs in nearly all of them, mostly by pushing them to their limits. For example, many compilers are not happy with 200MB of source in one file. The Microsoft C compiler has a limit of 65536 lines of code per file and the CLI was raising "System.InvalidProgramException: Method pypy.runtime.Constants:.cctor () is too complex.", where too complex probably means "too long". Just for fun, today we collected all projects we could think of in which we found bugs:
So one could say that PyPy is really just the most expensive debugging tool ever :-).
The Leysin sprint has started since yesterday morning in the usual location. The view is spectacular (see photo) the weather mostly sunny. The following people are sprinting:
- Maciej Fijalkowski
- Armin Rigo
- Toby Watson
- Paul deGrandis
- Antonio Cuni
- Carl Friedrich Bolz
We started working on various features and performance improvements for the high level backends (JVM and .NET) and on implementing ctypes for PyPy. Later this week we plan to spend a few days on the JIT, because Anto and I both need to get into it for our respective university projects.
PyPy contains a framework for writing custom Garbage Collectors, and a few simple GCs have been written in this framework. A common issue with all these GCs is how to find all the stack roots, i.e. all the pointers to live GC-managed objects currently stored in local variables, in all the callers of the current function. The current solution is to maintain a custom shadow stack of roots, where all functions push and pop copies of their local variables of type "GC pointer". Clearly this is an overhead. Can we remove it?
LLVM has recently grown some support for this. By emitting markers in the LLVM source and with the help of a bit of custom C++ code, we can generate stack maps for the functions compiled by LLVM. Then, with 100% non-portable code in our framework GC's root finding algorithm, we can walk the machine stack and locate where in each stack frame LLVM stores the GC pointers. (Yes, I mean non-portable: LLVM offers no help for doing that. Maybe it will at some point, though I didn't manage to explain why this is an issue to people working on this in LLVM so far...). I've tried that approach in the llvmgcroot branch. Over the manually-managed shadow stack, this gives speed improvements which are, very roughly, on the order of 5%.
Note that this prevents some optimizations in LLVM, because it forces it to allocate all local variables of type "GC pointer" in the stack; it cannot keep them in registers and it must assume that they can be changed more or less at any time (as moving GCs do). Can we do better?
Actually, yes. We can even do better in the C backend, using a GCC hack. GCC has this nice extension:
asm("bla", constrains);This is meant to generate assembler instructions directly from C. Internally, GCC considers the whole asm() as a single regular instruction of its intermediate language; the constrains are expressed in the same way as the constrains for all the prebuilt intermediate language instructions. They express things like input and output operands of the instruction, whether they can live in memory or in registers, whether the whole instruction has side-effects, etc. The nice thing about asm() is that it doesn't kill any optimization whatsoever in GCC - it's your job to make sure that you use the correct constrains.
So what I've tried in the asmgcroot branch is to use asm() as markers. In this branch, the C backend produces code like this after each function call, for each local variable containing a live GC pointer:
asm("/* GCROOT %0 */" : "=g"(localvar) : "0"(localvar) : "memory");
This causes GCC to emit the following line in the assembler file it generates:
/* GCROOT register-or-memory-containing-localvar */
I won't go in the details of the asm() line above - the constrains are just enough to make sure that GCC doesn't optimize too much, but don't prevent most optimizations from occurring. For example, the localvar can be in a register.
The assembler will just ignore the line above; it is a comment. But what we can do is write our own tool parsing the assembler files. This tool locates the /* GCROOT */ comments and follows where the register or memory location in the comment comes from (to do this it must follow the control flow and data flow of the function). This allows it to build a stack map: for each call instruction it knows exactly which registers and frame stack locations contain a live GC pointer. The stack map is then emitted in an extra assembler file that we link with the rest. As with LLVM above, the stack map is then used at run-time by non-portable code written in our GC's stack root tracker.
Yes, that's rather insane. But at least, we don't need to modify the assembler file - just read it. If GCC is too clever in its optimizations, the custom parser will get lost and complain cleanly; but I think that it is relatively safe in the sense that GCC optimizations should not be able to make the custom parser produce wrong results.
The branch is not merged because it's probably too insane to merge (not to mention, it's probably not portable to non-GCC compilers, and it is completely platform-specific). Still, it gives good results, better that the pure LLVM approach - on the order of 10% to 25% speed-ups for pypy-c.