JIT in Prolog
Some news from the JIT front. Progress on the JIT has been low-profile in the past few months. No big results to announce yet, but we have played with some new ideas, and they are now documented as a draft research paper: Towards Just-In-Time Compilation and Specialisation of Prolog.
Prolog? Yes. To understand this slightly unusual choice of programming language, here is first some background about our JIT.
PyPy contains not a JIT but a JIT generator, which means that we only write an interpreter for a language (say, the complete Python language), and we get a JIT "for free". More precisely, it's not for free: we had to write the JIT generator, of course, as well as some amount of subtle generic support code. The JIT generator preprocesses the (complete Python) interpreter that we wrote and links the result with the generic support code; the result is a (complete Python) JIT.
The way that this works so far gives us a generated JIT that is very similar to Psyco in the way it works. But Psyco has issues (and so the current PyPy JITs have the same issues): it can sometimes produce too much machine code, e.g. by failing to notice that two versions of the machine code are close enough that they should really be one; and it can also sometimes fail in the opposite way, by making a single sub-efficient version of the machine code instead of several efficient specialized versions.
A few months ago we have chosen to experiment with improving this instead of finishing and polishing what we had so far. The choice was mostly because we were (and still are) busy finishing and polishing everything else in PyPy, so it was more fun to keep at least the JIT on the experimental side. Besides, PyPy is now getting to a rather good and complete state, and it is quite usable without the JIT already.
Anyway, enough excuses. Why is this about Prolog?
In PyPy, both the (complete Python) interpreter and the JIT support code are in RPython. Now RPython is not an extremely complicated language, but still, it is far from the top on a minimalism scale. In general, this is a good in practice (or at least I think so): it gives a reasonable balance because it is convenient to write interpreters in RPython, while not being so bloated that it makes our translation toolchain horribly complicated (e.g. writing garbage collectors for RPython - or even JIT generators - is reasonable). Still, it is not the best choice for early research-level experimentation.
So what we did instead recently is hand-write, in Prolog, a JIT that looks similar to what we would like to achieve for RPython with our JIT generator. This gave much quicker turnaround times than we were used to when we played around directly with RPython. We wrote tiny example interpreters in Prolog (of course not a complete Python interpreter). Self-inspection is trivial in Prolog, and generating Prolog code at runtime is very easy too. Moreover, many other issues are also easier in Prolog: for example, all data structures are immutable "terms". Other languages than Prolog would have worked, too, but it happens to be one that we (Carl Friderich, Michael Leuschel and myself) are familiar with -- not to mention that it's basically a nice small dynamic language.
Of course, all this is closely related to what we want to do in PyPy. The fundamental issues are the same. Indeed, in PyPy, the major goals of the JIT are to remove, first, the overhead of allocating objects all the time (e.g. integers), and second, the overhead of dynamic dispatch (e.g. finding out that it's integers we are adding). The equivalent goals in Prolog are, first, to avoid creating short-lived terms, and second, to remove the overhead of dispatch (typically, the dispatching to multiple clauses). If you are familiar with Prolog you can find more details about this in the paper. So far we already played with many possible solutions in the Prolog JIT, and the paper describes the most mature one; we have more experimentation in mind. The main point here is that these are mostly language-independent techniques (anything that works both in Prolog and in RPython has to be language-independent, right? :-)
In summary, besides the nice goal of speeding up Prolog, we are trying to focus our Prolog JIT on the issues and goals that have equivalents in the PyPy JIT generator. So in the end we are pretty convinced that it will give us something that we can backport to PyPy -- good ideas about what works and what doesn't, as well as some concrete algorithms.
What is the reason you would back-port the Prolog implementation to RPython, and not make Prolog itself the standard language for implementing the JIT?
THat sounds like the great subject of a thesis for Carl. :-)
shalabh: because (hopefully) porting back to rpython is saner than porting all of our interpreter (including modules) to prolog.
A bit unsual approach =)
Hope it'll help...
What about making PyPy useful?
One thing I've never quite understood: how will the JIT-generation transform interact with more traditional optimization schemes?
Concrete example: say in a function I want to perform some algebraic reductions of math operations which will change a lot of the instructions. Since the JIT generation turns the interpreter into a JIT, presumably I have to write the optimization at the interpreter level.
I can see how that could work for the simplest kind of optimizations (special cases should be specialized at runtime after they go green, if I understand the rainbow colour scheme.)
I don't see yet how the more complex optimizations I'd write on static, fixed-type code will look in this context. IIUC at interpreter level I can only access the JIT's observations via tests like "if type(a) == FloatType" which should be filled after they're known-- but that's inside the function itself, and I don't see how to access that information from anything outside.
dsm: This is a two-level approach, corresponding to two levels of optimisations that are useful for dynamic languages like Python: the "high level" is the unboxing and dispatching removing that I describe in the post (which by itself can give something like a factor 50-100 speed-up in the best cases). Traditional "low level" optimisations can be performed on top of that, by optimising the generated code that comes out of the "high level" (and this could give another 2-4x speed-up, i.e. the same difference as between "gcc" and "gcc -O3").
In this Prolog experiment we are only focusing on how to get the high level optimisations.
The references in the paper are not properly numbered -- any idea if it could be fixed?
Michel: Thanks for noticing, it should be fixed.
Could you possibly profit from generating a JIT compiler for Lua (www.lua.org) and compare it to Mike Pall's Lua-Jit (https://luajit.org/)?
While the paper was too difficult for me to understand fully, it was still an interesting read and I appreciate you posting it.
FYI: There is a project called Pyke which adds Prolog-like inferencing to Python. This integrates with Python allowing you to include Python code snippets in your rules.
Don't know if this would be useful, but you can check it out at https://pyke.sourceforge.net.
Shalabh: It's also important to note 3 big benefits of implementing a language in the language itself, or a subset thereof ("turtles all the way down").
(1) Debugging and testing tools for programs written in the language then (hopefully) also work for debugging and testing the language implementation with minimal (or no) modification. This also HUGELY lowers the bar for ordinary users of the language to find and fix implementation bugs. This isn't a fault of Prolog, but 99.99% of Python users won't touch a Prolog debugger with a 10-foot pole.
(2) The largest pool of people most interested in improving the language is presumably the expert heavy users of the language. Forcing them to learn a new language and/or implement the language in a language outside their expertise is a large disadvatage.
(3) The difference between language builtins and user code is reduced. Also, it forces certain powerful constructs to (at times) be exposed in the language when they might otherwise only be exposed in the implementation language. Also, with "turtles all the way down", performance improvements in the language itself also often apply to the language builtins, which increases the benefit of improvements, which is important in the cost/benefit analysis for undertaking the performance improvements in the first place. Having "turtles all the way down" make some optimizations worthwhile that otherwise would be too much trouble to implement.