Using CPython extension modules with PyPy natively, or: PyPy can load .pyd files with CPyExt!
PyPy is now able to load and run CPython extension modules (i.e. .pyd and .so files) natively by using the new CPyExt subsystem. Unlike the solution presented in another blog post (where extension modules like numpy etc. were run on CPython and proxied through TCP), this solution does not require a running CPython anymore. We do not achieve full binary compatiblity yet (like Ironclad), but recompiling the extension is generally enough.
The only prerequisite is that the necessary functions of the C API of CPython are already implemented in PyPy. If you are a user or an author of a module and miss certain functions in PyPy, we invite you to implement them. Up until now, a lot of people (including a lot of new committers) have stepped up and implemented a few functions to get their favorite module running. See the end of this post for a list of names.
Regarding speed, we tried the following: even though there is a bit of overhead when running these modules, we could run the regular expression engine of CPython (_sre.so) and execute the spambayes benchmark of the Unladen Swallow benchmark suite (cf. speed.pypy.org) and experience a speedup: It became two times faster on pypy-c than with the built-in regular expression engine of PyPy. From Amdahl's Law it follows that the _sre.so must run several times faster than the built-in engine.
Currently pursued modules include PIL and others. Distutils support is nearly ready. If you would like to participate or want information on how to use this new feature, come and join our IRC channel #pypy on freenode.
Amaury Forgeot d'Arc and Alexander Schremmer
Further CPyExt Contributors:
- Alex Gaynor
- Benjamin Peterson
- Jean-Paul Calderone
- Maciej Fijalkowski
- Jan de Mooij
- Lucian Branescu Mihaila
- Andreas Stührk
- Zooko Wilcox-O Hearn
PyPy on google open source blog
Bea Düring, from the PyPy team, wrote a post for google open source blog covering PyPy's 1.2 release. It's also the first public mention of the fact that google provided financial support for PyPy's 2.5 compatibility. Thanks!
Interesting read, thank you. By the way, are there any plans to push for 3.x compatibility?
Introducing nightly builds and ubuntu PPA
We're pleased to announce two things that we were constantly asked for: Nightly builds and Ubuntu PPA for 1.2 release made by Bartosz Skowron. There are no nightly build ubuntu packages (yet).
Nightly builds are what they are - pure pypy executables with JIT compiled in (for linux only now). They require either a pypy checkout or a release download. The main difference is that by default display more debugging information than release builds and that they contain recent bugfixes and improvements of course :-)Cheers
Niiice =) Using PyPy becomes easier.
Could please disable jit on amd64 or perhaps build 32-bit deb for amd64 machines?
@nek0ton building 32bit JIT for 64bit is hard since you need 32bit libraries. We just don't build nightly 64bit (nor release contained it).
@fijal Why so? 32bit libraries are available on ubuntu (with ia32 suffix), kernel is build with 32bit support option. Don't see any problem here.
I understand why not to build 64bit release - JIT is the goal.
P.S. Maybe unavailable amd64 build would force someone to digg and fix that issue? =) Are there any guides available to do it?
the reason is precisely what you described - you need custom libraries linked with special suffix or place which is probably distribution dependent.
What would it take to make a 64 bit native everything (amd64)?
Btw. I noticed the supported modules list seems to be incomplete at https://pypy.org/compat.html
At least os, subprocess seem to be there even if not listed, probably more?
The general answer is that both subprocess and os are written in Python (and not C), so we don't list them. However I wonder how we can list things not to confuse people who don't know that. Any ideas (listing all possible modules is a bit too much).
If the supported modules is over 50% of all, how about just listing modules that still require work? I suspect many people are unaware that PyPy is getting feature complete, usable for real work.
Blog coverage of speed.pypy.org
If you want to read a detailed analysis about why speed.pypy.org is cool, head over to Saveen Reddy's blog at the MSDN.
First of all congratulations for the great work, I can say I am a newbie in Python world but I follow with interest this project. I tryed the release with the JIT compiler with also the parallel python module and the speed gain is sensible. I compared also the performance with psyco on 3 or 4 benchmarks and it seems that the time for the execution is usually more or less the same. Do you think there will be the possibility again for a massive speed improvement in future releases or the level of max performance is not so far? How much faster could it be in the future?
According to the Computer Language Benchmarks Game, there are three benchmarks that perform way slower in Pypy against Python 3 ( see here: https://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=pypy&lang2=python3 ).
I know that regex-dna performs slower because regex haven't been optimized yet, but what's the reason for the other two? Do they use regex too?
@Luis pidigits is about using gmpy for cpython vs longs for pypy. It's a bit apples vs oranges. That said, CPython's longs are still faster than pypy's so we definitely can improve. This are needs some love :)
Reverse complement is string benchmark and I did not look but it might be that the speed of str.translate is suboptimal.
Heroes of the 1.2 Release
Now that the release is done I wanted to list and to thank some people that were essential in the process of getting it out of the door, particularly because the work of some of them is not very visible usually.
Armin Rigo and Maciej Fijałkowski tirelessly worked on most aspects of the release, be it fixing the last known bugs and performance problems, packaging or general wizardry.
Amaury Forgeot d'Arc made sure that PyPy 1.2 actually supports Windows as a platform properly and compiled the Windows binaries.
Miquel Torres designed and implemented our new speed overview page, https://speed.pypy.org which is a great tool for us to spot performance regressions and to showcase our improvements to the general public.
tav designed the new user-oriented web page, https://pypy.org which is a lot nicer for people that only want to use PyPy as a Python implementation (and not be confused by how PyPy is actually made).
Holger Krekel fixed our main development server codespeak.net, even while being on vacation and not really having online connectivity. Without that, we couldn't actually have released anything.
Bartosz Skowron worked a lot on making Ubuntu packages for PyPy, which is really cool. Even though he didn't quite finish in time for the release, we will hopefully get them soon.
Thanks to all you guys!
Many thanks to all of you for the hard work, PyPy is shaping up very nicely. :-)
Heh, I would finish the Ubuntu package if i didn't have restricted Internet access (only port 80 is working in the hotel where i'm staying now). please wait till Monday :)
Introducing the PyPy 1.2 release
We are pleased to announce PyPy's 1.2 release. This version 1.2 is a major milestone and it is the first release to ship a Just-in-Time compiler that is known to be faster than CPython (and unladen swallow) on some real-world applications (or the best benchmarks we could get for them). The main theme for the 1.2 release is speed.
The JIT is stable and we don't observe crashes. Nevertheless we would recommend you to treat it as beta software and as a way to try out the JIT to see how it works for you.
- The JIT compiler.
- Various interpreter optimizations that improve performance as well as help save memory. Read our various blog posts about achievements.
- Introducing a new PyPy website at pypy.org made by tav and improved by the PyPy team.
- Introducing speed.pypy.org made by Miquel Torres, a new service that monitors our performance nightly.
- There will be ubuntu packages on PyPy's PPA made by Bartosz Skowron, however various troubles prevented us from having them as of now.
Known JIT problems (or why you should consider this beta software) are:
- The only supported platform is 32bit x86 for now, we're looking for help with other platforms.
- It is still memory-hungry. There is no limit on the amount of RAM that the assembler can consume; it is thus possible (although unlikely) that the assembler ends up using unreasonable amounts of memory.
If you want to try PyPy, go to the download page on our excellent new site and find the binary for your platform. If the binary does not work (e.g. on Linux, because of different versions of external .so dependencies), or if your platform is not supported, you can try building from the source.
The PyPy release team,
Armin Rigo, Maciej Fijalkowski and Amaury Forgeot d'Arc
Antonio Cuni, Carl Friedrich Bolz, Holger Krekel, Samuele Pedroni and many others.
The front page of the new PyPy site should include some of these caveats about it being beta software; it gives the wrong impression about PyPy's current status.
Congratulations! I've been looking forward to this.
Question: does PyPy have an API for creating native modules?
why is spambayes so slow? does it use regular expressions?
Why is there a problem with nbody and itertools ?
pypy temporarily in the benchmarks game.
@horace: yes, regexes are probably the problem.
@Issac: combinations is a 2.6 feature, which we don't support.
Would anyone care to contribute a modified working nbody program to the benchmarks game? ;-)
@Isaac: we have nbody_modified in our benchmarks, source code here.
I just tried the windows binary.
Oh damn, it is really FAST!!!
3x performance gain...
C:\work\bzr-proj>pypy script.py -t i2d -f longdata.txt
m = 1 total = 128
m = 2 total = 16384
m = 3 total = 2097152
Require M stage: 3
Time taken 00:00:05 (907ms)
C:\work\bzr-proj>python script.py -t i2d -f longdata.txt
m = 1 total = 128
m = 2 total = 16384
m = 3 total = 2097152
Require M stage: 3
Time taken 00:00:15 (093ms)
Forgot about the memory usage, python consume ~4MB and pypy consume ~24MB. Pypy need 6x more memory, but I don't care about this in my script since the performance gain is significant.
I really want to know the pypy vs luajit, I think luajit should be much faster. I am in progress in converting my script to lua but that is painful, my knowledge on lua doesn't match with python.
@shin if you have a comparison to LuaJIT, I would be extremely interested to hear the results! I agree that LuaJIT will likely be faster though.
I really want to know the pypy vs luajit, I think luajit should be much faster. I am in progress in converting my script to lua but that is painful, my knowledge on lua doesn't match with python.
State of PyPy talk from Pycon
The last PyPy video from pycon has been uploaded. It's a very short (less than 10 minutes) "keynote" talk about state of PyPy.
Some time ago, we introduced our nightly performance graphs. This was a quick hack to allow us to see performance regressions. Thanks to Miquel Torres, we can now introduce https://speed.pypy.org, which is a Django-powered web app sporting a more polished visualisation of our nightly performance runs.
While this website is not finished yet, it's already far better than our previous approach :-)
Details about announcement on pypy-dev are found here.
If you're are interested in having something similar for other benchmark runs, contact Miquel (tobami at gmail).
Quoting Miquel: "I would also like to note, that if other performance-oriented opensource projects are interested, I would be willing to see if we can set-up such a Speed Center for them. There are already people interested in contributing to make it into a framework to be plugged into buildbots, software forges and the like. Stay tuned!"
Excellent! We really ought to deploy this for unladen, too. Unfortunately, I don't think I'll have the time to get that going. :(
In my mind PyPy with its JIT will/should eventually get us close to matching or beating Java performance for the non-dynamic subset of python. Would that be a fair statment? If so is there some bench mark that allows us to compare that. What that be usefull?
I would love to see this become a Python implementation shootout, a single place where we could compare the speeds of CPython/PyPy/Unladen/Jython/IronPython
This is great! It's excellent to see the fruits of the pypy jit work so clearly.
I'd also like to see this in place for other Python implementations.
Ok. So I've seen those feature request often enough. These benchmarks are not good for tracking memory usage - they'll simply measure the amount interpreter allocates at the beginning. If you provide better ones, we'll do it.
With the JIT would a script that does not use the dynamic aspects of python be able to match the speed of Java?
@Reid: maybe I can help you out setting it up. You could actually even begin saving results to speed.pypy.org right away with minimal configuration changes (though I understand you may prefer to have your own site and DB).
The first features are catering to trunk development, which was the most urgent thing.
But my plan all along was to implement a third tab for comparing implementations (among other things. See mailing list announcement for details).
So your wish should come to pass :-)
Neat! I still like the original graphs though, it's nice to see the history for all the benchmarks together.
I think the 'average' is pretty meaningless - it implies that a simple average of all the benchmarks will correspond to the typical real-world speed up you will get using pypy with your existing python code, which I don't think is true.
a view showing all timeline graphs at once is also planned.
About the average, of course you can not take from it that pypy-c-jit is nearly 3 times as fast as cpython. Because it depends on the particular choice of benchmarks, which right now is not at all representative of actual real-world usage.
Regardless, it is there so that a developer gets an overall feeling for how a given revision change has affected performance across all benchmarks.
We can't avoid the risk of people reaching wrong conclusions, but that is always the case with statistics, averages and benchmarks ;-)
@sarvi: reaching the speed of Java is a really non-trivial goal, because Sun's JVM has really been highly optimized over many years. I guess it will take us a long time (if at all) to reach such levels of performance.
I understand JVM is highly optimized.
And overtime and once yall have more momentum industry funding I am sure your VM will get just as optimized. I am sure Google will pick you guys up soon. I have no doubt about it. Unladen Swallow seems a waste of time once yall get more credibility.
Even then I do expect Dynamic scripting capabilities to perform slower the Java.
I am just hoping that eventually the non-dynamic parts of python will perform on par with Java.
And we can all program in just Python and C. :-))
Great work! BTW, could it be possible to also have a quick link to the source code of the benchmarks in the website?
yeah, such things are missing right now.
An about page, and possibly an explanation (with links to the code) of each benchmark are probably going to be implemented. Currently there is only tooltip explanations for some.
Another silly question:
AFAIK, the benchmark improvements seen lately are due to the way you measure avergages, by excluding warmup time. Seeing that warmup takes time that may be critical in some situations, I wonder if it's possible to somehow "save" the generated jited code so it can be reused after the first time it's generated.
This way, it would be possible to distribute programs already "warmed up", kind of a compiled version of them. Sorry if this doesn't make sense at all... for a clueless ignorant like me, it does!
Hey. It's a valid option, but it's however at least hard (if not next to impossible). There is work planned on reducing warmup time, so it won't matter that much instead.
It would be nice if the timeline had the date on it (only where the date changes, and the beginning + end).
I recently did some benchmarking of twisted on top of PyPy. For the very impatient: PyPy is up to 285% faster than CPython. For more patient people, there is a full explanation of what I did and how I performed measurments, so they can judge themselves.
The benchmarks are living in twisted-benchmarks and were mostly written by Jean Paul Calderone. Even though he called them "initial exploratory investigation into a potential direction for future development resulting in performance oriented metrics guiding the process of optimization and avoidance of complexity regressions", they're still much much better than average benchmarks found out there.
The methodology was to run each benchmark for quite some time (about 1 minute), measuring number of requests each 5s. Then I looked at dump of data and substracted some time it took for JIT-capable interpreters to warm up (up to 15s), averaging everything after that. Averages of requests per second are in the table below (the higher the better):
|names||10930||11940 (9% faster)||15429 (40% faster)|
|pb||1705||2280 (34% faster)||3029 (78% faster)|
|iterations||75569||94554 (25% faster)||291066 (285% faster)|
|accept||2176||2166 (same speed)||2290 (5% faster)|
|web||879||854 (3% slower)||1040 (18% faster)|
|tcp||105M||119M (7% faster)||60M (46% slower)|
To reproduce, run each benchmark with:
benchname.py -n 12 -d 5
WARNING: running tcp-based benchmarks that open new connection for each request (web & accept) can exhaust number of some kernel structures, limit n or wait until next run if you see drops in request per second.
The first obvious thing is that various benchmarks are more or less amenable to speedups by JIT compilation. Accept and tcp getting smallest speedups, if at all. This is understandable, since JIT is mostly about reducing interpretation and frame overhead, which is probably not large when it comes to accepting connections. However, if you actually loop around, doing something, JIT can give you a lot of speedup.
The other obvious thing is that PyPy is the fastest python interpreter here, almost across-the board (Jython and IronPython won't run twisted), except for raw tcp throughput. However, speedups can vary and I expect this to improve after the release, as there are points, where PyPy can be improved. Regarding raw tcp throughput - this can be a problem for some applications and we're looking forward to improve this particular bit.
The main reason to use twisted for this comparison is a lot of support from twisted team and JP Calderone in particular, especially when it comes to providing benchmarks. If some open source project wants to be looked at by PyPy team, please provide a reasonable set of benchmarks and infrastructure.
If, however, you're a closed source project fighting with performance problems of Python, we're providing contracting for investigating opportunities, how PyPy and not only PyPy, can speed up your project.
- names - simple DNS server
- web - simple http hello world server
- pb - perspective broker, RPC mechanism for twisted
- iterations - empty twisted loop
- accept - number of tcp connections accepted per second
- tcp - raw socket transfer throughput
- CPython 2.6.2 - as packaged by ubuntu
- Unladen swallow svn trunk, revision 1109
- PyPy svn trunk, revision 71439
Twisted version used: svn trunk, revision 28580
Machine: unfortunately 32bit virtual-machine under qemu, running ubuntu karmic, on top of Quad core intel Q9550 with 6M cache. Courtesy of Michael Schneider.
Would be nice to see at least rough approximation of amount of RAM used by each implementation. :-)
Great as always.
I'm looking forward to use PyPy in production with the next stable release in march. =)
Is it possible to run the same tests with CPython+Psyco?
That would be really interesting to see!
No, psyco has limitations on frames that break zope.interface which twisted depends on.
I agree with Yuri, it would be of interest to record memory stats for each benchmark run.
Awesome results Maciej.
Question: what's it gonna take for pypy to supplant Cpython?
You're faster and I'm guessing you have nowhere near the manpower of Cpython. Plus, you're written in Python so future work will be much easier. Seems like a no brainer to embrace pypy.
Question: After having read many comments and posts from pypy's developers lately, I got the impression (I might be wrong though), that you are betting all on tracing for getting speedups, (that the slow interpreter will eventually be compensated by the magic of tracing).
However, other projects that rely on tracing seem to favor a dual approach, which is a traditional method-a-time jit (which can evenly speed up all kinds of code) plus tracing for getting the most of highly numerical code (luajit 2.0, mozila's jaegermonkey, for example).
Is this accurate or I'm wrong? Do you think that the current tracing strategy will eventually get speedups for those benchamarks that are currently on par or way bellow cpython? Or will you have to add a more traditional approach for the baseline?
That's a very interesting question. I will try answer couple of your points, but feel free to move to pypy-dev mailing list if you want to continue discussion.
We indeed bet on tracing (or jitting in general) to compensate for slower interpretation than CPython. However, our tracing is far more general than spidermonkeys - for example we can trace a whole function from start and not require an actual loop. We hope to generalize tracing so it can eventually trace all constructs.
The main difference between ahead-of-time and tracing is that tracing requires actual run, while ahead-of-time tries to predict what will happen. Results are generally in favor of tracing, although the variation will be larger (tracing does statistically correct branch prediction, not necesarilly always the correct one).
Regarding benchmarks, most of those benchmarks that we're slower than CPython showcase that our tracing is slow (they don't contain warmup). And again, for some of those we'll just include warmup (like twisted.web which is web server, makes sense in my opinion), for other we'll try to make tracing faster. And again, the speed of tracing is not the property of tracing, but rather pypy's limitation right now.
Some other benchmarks are slow because we don't JIT regular expressions (spambayes). This should be fixed, but it's again unrelated to tracing.
To summarize: I don't expect us trying dual approach (one jit is enough fun, believe me), but instead generalizing tracing and making it more efficient. How this will go, we'll see, I hope pretty well.
other than Maciek's points, which I subscribe, it should be said
that, since each language has a different semantics, the
efficiency of a traditional "method-at-a-time" JIT can vary
dramatically. In particular, the dynamism of Python is so deep
that a traditional JIT cannot win much: Jython and IronPython do
exactly that, but for most use cases are slower than CPython. If
you are interested, Chapter 2 of my PhD thesis explores these
As for the warm-up, would it be possible to save some of the tracing decisions in some file (.pyt?) to help on next startup?
Saving the results is hard, but not impossible. There are other possibilities (like keeping process around) though.
Pycon 2010 report
Greetings to everybody from Pycon 2010 Atlanta. Right now I'm sitting in a sprint room with people sprinting on various projects, like CPython, twisted etc. The conference was really great, and I've seen some good talks, although I've been too exhausted from my own talks to go to too many. Probably I should stay away from proposing that many talks to next pycon :-)
The highlight of sprints was that we got a common mercurial repository at python.org for python benchmarks. We might be able to come up with "the python benchmark suite" which will mostly consist of simple benchmarks using large python libraries, rather than microbenchmarks. The repository was started by the Unladen Swallow people and we already have common commit access among PyPy, CPython, Unladen Swallow, Jython and Iron Python. We don't have yet a common place to run benchmarks, but we should be able to fix that soon.
Regarding the talks, there are online videos for How to write cross-interpreter python programs and Speed of PyPy talks, among other talks from Pycon. There should be a video for my short keynote shortly.
The talks were well received as there is interest in PyPy's progress.
Hi, I just wanted to say that there's something wrong with the PLOT ONE graphic. The speedups are expressed by horizontal lines (each one is 2x). The third line shows 8x instead of 6x.
Holy crap, this is huge! Is it available in the PPA already? I guess this would put all benchmarks past CPython speed (except for outliers like the euler14 thing).
Great news! What is the status of numpy/scipy support?
@Anonymous I don't think anyone has started trying to test numpy or scipy yet, however fundamentally it's just a matter of implementing missing functions. For me starting on numpy in my next goal, after PIL.
This is very good news. JIT compiled Python can never fully replace extension modules (existing ones, or the need for new ones), so extension support should be a high priority for the PyPy project. I hope you can eventually get rid of that overhead.
wow, just coming back from vacation and have to say: great news and great work, guys! Historically speaking, this is the third approach to the "ext" module issue and if the promise works out as it seems to do, probably the last as far as leveraging cpy ext modules are concerned! I wonder - does it still make sense to have "native" extension modules, the ones we currently have as "mixed" modules?
Let me ask for a bit more detail. I depend on a module (https://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html), that is currently unsupported, as far as I know. I'd really like to port it to pypy. Where to start?
Is it possible that the module runs without modifications? Can I check this simply by building a pypy-trunk, and write "import cmaxent"?
@Anonymous: No it's not in the PPA. We provide only the latest release (1.2 in this case) and weekly builds for trunk (which haven't been announced on the blog yet). CPython extension modules live in their own branch. The branch will be merged into the trunk sooner or later.
PS. The weekly builds are available here at https://launchpad.net/~pypy
To test your module, you need to compile and load it. For compilation, you can use a compiled pypy binary and run setup.py build_ext with your setup file. For hints about manual compilation and module loading, visit our IRC channel.
MixedModules allow you to implement modules in RPython (using the PyPy API) and Python at the same time. CPyExt is for modules written in C using the CPython API. So both solutions are for different needs.
what about embedding pypy? will this work too in the future?
the reason i ask is blender. there were some security concerns among blender developers recently. blender uses embedded cpython for scripting. normal scripts (like exporters) which have to be evoked by the user aren't that much of a problem but blender also supports python expressions for animation parameters. without a sandbox downloading and opening .blend files from unknown sources is kind of risky since a malicious python expression theoretically could wipe your harddisk.
pypy with its support for a sandbox could be a very good replacement for cpython in blender (also because of its speed) but if it isn't compatible with the cpython api then a swap probably would be way too much effort.
what about embedding pypy?
That should work as easy as extending.
@alexander True, mixed modules are for rpython-implemented modules and need to be translated together with the pypy interpreter and could make use of the JIT. My question more aimed at the issue for which use cases / goals which kind of extension module mechanism makes sense.
IOW, some discussion and web page regarding rpy-ext/ctypes/cpy-ext would make sense, i guess. Or is it somewhere already?
some discussion and web page regarding rpy-ext/ctypes/cpy-ext would make sense
Yes, someone could write down guidelines. Using the C API runs your module fast in case of CPython. A bit slower on ironpython and PyPy.
Using ctypes gives your module access to these three interpreters as well, but it will run slower. One advantage here is that you do not need to write C to create a wrapper around a library. If your objective is speed and lower memory usage, then CTypes does not work either.
Mixed modules make your module work only on PyPy and provide a decent speed and a mixture of a decent (Python) and a bit harder to grasp (RPython) programming language. This only makes sense as a platform if your users are also using PyPy.
Super awesome! Can't wait to get home and try it out.
It's a few months later, and I'm wondering what progress has been made. Early comments mentioned that nobody had tried numpy or scipy yet -- has that changed?
Also, does this make the multiprocessing library available? Or, is pp (parallel processing) available?
I'm very excited about PyPy because of the JIT. But for my work I also need some form of utilizing multiple CPU's. Right now I'm using unladen swallow with the multiprocessing module.
Yup, I'd love to hear about the progress on this.
Any chance this will be released sometime?
It was already released, just check out the current PyPy release.