Some notes from my recent trip (from 23rd of January to 17th of February) to the US where, I presented PyPy at various scientifically oriented places. In summary, there seems to be quite a bit of interest in PyPy within the research community, details below.
From the 24th to the 29th of January I was in Austin, Texas at the POPL conference, where I gave a talk at one of the workshops, PEPM (Partial Evaluation and Program Manipulation). The title of our paper is "Allocation Removal by Partial Evaluation in a Tracing JIT", the abstract is:
The performance of many dynamic language implementations suffers from high allocation rates and runtime type checks. This makes dynamic languages less applicable to purely algorithmic problems, despite their growing popularity. In this paper we present a simple compiler optimization based on online partial evaluation to remove object allocations and runtime type checks in the context of a tracing JIT. We evaluate the optimization using a Python VM and find that it gives good results for all our (real-life) benchmarks.
The talk (slides) seemed to be well-received and there was a good discussion afterwards. PEPM in general was a very enjoyable workshop with many interesting talks on partial evaluation (which I am very interested in) and a great keynote by Olivier Danvy about "A Walk in the Semantic Park".
POPL itself was a bit outside of the area I am most knowledgeable in, most of the talks being on formal topics. Some of the talks that stuck to my mind:
- "The Design of Kodu: A Tiny Visual Programming Language for Children on the Xbox 360", the keynote by Matthew MacLaurin from Microsoft Research. I didn't know about Kodu before, and was very impressed by it.
- "Automating String Processing in Spreadsheets using Input-Output Examples" (paper) by Sumit Gulwani (also from MS Research) describes a plugin to Excel that can automate many common string processing tasks by giving a couple of examples, which are then abstracted into a generic string manipulation. Very cool.
- "Dynamic Inference of Static Types for Ruby" (paper) by Michael Furr, Jong-hoon (David) An, Jeffrey S. Foster and Michael Hicks describes an approach to type inference that works by observing the actual types seen during unit-testing. Similar things have been done a few times before, however, the paper actually gives a correctness result.
- "The Essence of Compiling with Traces" (paper) by Shu-Yu Guo and Jens Palsberg describes a formalization of a simple imperative language and proves that executing it using trace compilation will do exactly the same thing than using an interpreter. It also looks at what conditions an optimization on traces must fulfill to still produce valid results.
After the main conference, I took part in the STOP (Scripts to Programs) workshop. It had a great keynote "Scripting in a Concurrent World" by John Field about the Thorn language and a few interesting other talks.
At Microsoft I gave a talk about "PyPy's Approach to Implementing Dynamic Languages Using a Tracing JIT Compiler", the slides of which can be found here. The talk was filmed and is online. People seemed to be impressed with the "product qualities" of PyPy, e.g. the buildbot infrastructure and speed tracking website.
The rest of the time I discussed with various researchers in the RiSE group, particularly with Nikolai Tillmann. We talked a lot about similarities and differences between SPUR and PyPy and tried to understand our respective projects better. SPUR is a really great project and I learned a lot in the discussions, for example about the optimizations and heuristics their trace compiler uses.
Another very cool project done by the RiSE group that I learned more about is PEX. PEX is a unit test generator for C# that tries to produce unit tests for so-far untested execution paths within methods. There is an online puzzle version of it, if you want to get an impression of the technology (including a very impressive C# IDE in the browser).
For the last part of the trip I stayed in New York City for two weeks, mostly as a vacation. However, I also visited IBM Watson Research Center for two days, to which I had been invited by David Edelsohn.
The first day I gave the same presentation I had given at Microsoft (with some improvements to the slides), again it was quite well received. The rest of the time I spent in (very fruitful) discussions with various people and teams, among them the Liquid Metal team and the Thorn team.
The second day I met with members of the FIORANO group, who are working on dynamic compilation for dynamic languages and Java. They explored various ways to speed up Python, both by improving the CPython interpreter as well as with JIT compilation techniques.
Another of their projects is to add a trace compiler to IBM's J9 JVM, about which the paper "A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler" is going to appear at CGO. I discussed tracing JITs with Peng Wu, one of the authors of that paper. Peng tries to systematically look at the various heuristics found in the different VMs that use tracing JITs. This is a very different perspective from the one I usually have, focusing on how to improve PyPy's specific heuristics. Therefore that discussion helped me thinking about the issues more generally.
Another goal of the group is to try to find benchmarks that are representative for typical Python workloads, which is something that has been done very carefully for Java e.g. when developing the DaCapo benchmark suite. The benchmarks that the Python community uses have not been selected in such a careful and measured way, so I think that trying to be more systematic there is a very worthwhile endeavour.