As usual, progress is going slower then predicted, but nevertheless, we're working hard to make some progress.
We recently managed to make our nice GCs cooperate with our JIT. This is one point from our detailed plan. As of now, we have a JIT with GCs and no optimizations. It already speeds up some things, while slowing down others. The main reason for this is that the JIT generates assembler which is kind of ok, but it does not do the same level of optimizations gcc would do.
So the current status of the JIT is that it can produce assembler out of executed python code (or any interpreter written in RPython actually), but the results are not high quality enough since we're missing optimizations.
The current plan, as of now, looks as follows:
- Improve the handling of GCs in JIT with inlining of malloc-fast paths, that should speed up things by a constant, not too big factor.
- Write a simplified python interpreter, which will be a base for experiments and to make sure that our JIT does correct things with regard to optimizations. That would work as mid-level integration test.
- Think about ways to inline loop-less python functions into their parent's loop.
- Get rid of frame overhead (by virtualizables)
- Measure, write benchmarks, publish