26 comments

Image processing is notoriously a CPU intensive task. To do it in realtime, you need to implement your algorithm in a fast language, hence trying to do it in Python is foolish: Python is clearly not fast enough for this task. Is it? :-)
Actually, it turns out that the PyPy JIT compiler produces code which is fast enough to do realtime video processing using two simple algorithms implemented by Håkan Ardö.
sobel.py implements a classical way of locating edges in images, the Sobel operator. It is an approximation of the magnitude of the image gradient. The processing time is spend on two convolutions between the image and 3x3-kernels.
magnify.py implements a pixel coordinate transformation that rearranges the pixels in the image to form a magnifying effect in the center. It consists of a single loop over the pixels in the output image copying pixels from the input image.
You can try by yourself by downloading the appropriate demo:

pypy-image-demo.tar.bz2: this archive contains only the source code, use this is you have PyPy already installed

pypy-image-demo-full.tar.bz2: this archive contains both the source code and prebuilt PyPy binaries for linux 32 and 64 bits

To run the demo, you need to have mplayer installed on your system. The demo has been tested only on linux, it might (or not) work also on other systems:

$ pypy pypy-image-demo/sobel.py

$ pypy pypy-image-demo/magnify.py

By default, the two demos uses an example AVI file. To have more fun, you can use your webcam by passing the appropriate mplayer parameters to the scripts, e.g:

$ pypy demo/sobel.py tv://

By default magnify.py uses nearest-neighbor interpolation. By adding the option -b, bilinear interpolation will be used instead, which gives smoother result:

$ pypy demo/magnify.py -b

There is only a single implementation of the algorithm in magnify.py. The two different interpolation methods are implemented by subclassing the class used to represent images and embed the interpolation within the pixel access method. PyPy is able to achieve good performance with this kind of abstractions because it can inline the pixel access method and specialize the implementation of the algorithm. In C++ that kind of pixel access method would be virtual and you'll need to use templates to get the same effect without incurring in runtime overhead.

The video above shows PyPy and CPython running sobel.py side by side (PyPy taking input from the webcam, CPython from the test file). Alternatively, to have a feeling on how much PyPy is faster than CPython, try to run the demo with the latter. These are the the average fps (frames per second) that I get on my machine (Ubuntu 64 bit, Intel i7 920, 4GB RAM) when processing the default test.avi video and using the prebuilt PyPy binary found in the full tarball alinked above. For sobel.py:

PyPy: ~47.23 fps

CPython: ~0.08 fps

For magnify.py:

PyPy: ~26.92 fps

CPython: ~1.78 fps

This means that on sobel.py, PyPy is 590 times faster. On magnify.py the difference is much less evident and the speedup is "only" 15x.
It must be noted that this is an extreme example of what PyPy can do. In particular, you cannot expect (yet :-)) PyPy to be fast enough to run an arbitrary video processing algorithm in real time, but the demo still proves that PyPy has the potential to get there.

Anonymous wrote on 2011-07-07 17:47:

Pypy is awesome!

Anonymous wrote on 2011-07-07 18:19:

I have a n00b problem: On Mac OS X 10.5.8, the precompiled pypy binary crashes with this message:
dyld: Library not loaded: /usr/lib/libssl.0.9.8.dylib

What's up with this? Thanks, and sorry for being offtopic.

metapundit.net wrote on 2011-07-07 19:17:

I saw this demo recently when Dan Roberts presented at Baypiggies. We broke into spontaneous applause when the pypy runtime ran at a watchable speed after cpython ran at less than 1 frame/second. Very impressive!

Anonymous wrote on 2011-07-07 21:07:

Anonymous, can you read?

"prebuilt PyPy binaries for linux 32 and 64 bits"
"The demo has been tested only on linux, it might (or not) work also on other systems"

Mac OS X is not Linux.

schmichael wrote on 2011-07-07 21:23:

Perhaps add a comment to sobel.py explaining what "pypyjit.set_param(trace_limit=200000)" does?

Luis wrote on 2011-07-07 22:27:

The only chamge I'd like to see in this project is its name... Trying to gather news from twitter for example, makes me search amongst thousands of comments in japanese (pypy means "boobies" in japanese), other incomprehensible comments in malay and hundreds of music fans of Look-Ka PYPY (WTF??)

Anonymous wrote on 2011-07-07 22:58:

Other Anonymous: Yes, I can read. I should have given a bit more context, but I was offtopic anyway. My goal was not running the demo, my goal was running pypy. I used the OS X binary from pypy.org. For those who are really good at reading, this was probably clear from the fact that my binary only crashed at library loading time.

Antonio Cuni wrote on 2011-07-07 23:03:

@Anonymous: most probably, the prebuilt PyPy for Mac Os X was built on a system different (older?) than yours.

For a quick workaround, you can try to do "ln -s /usr/lib/libssl-XXX.dylib /usr/lib/libssl.0.9.8.dylib". This should at least make it working, but of course it might break in case you actually use libssl.

The proper fix is to recompile PyPy by yourself.

Antonio Cuni wrote on 2011-07-07 23:08:

@schmichael

to avoid the potential problem of infinite tracing, the JIT bails out if it traces "too much", depending on the trace_limit.
In this case, the default trace_limit is not enough to fully optimize the whole algorithm, hence we need to help the JIT by telling it to trace a bit more than usual.

I agree that having to mess up with the internal parameters of the JIT is suboptimal. I plan to address this issue in the next weeks.

relet wrote on 2011-07-07 23:43:

How does it perform against python-opencv?

Anonymous wrote on 2011-07-07 23:47:

Antonio: Thanks for the quick reply. Unfortunately pypy can't be misled with the symlink hack: "Reason: Incompatible library version: pypy requires version 0.9.8 or later, but libssl.0.9.8.dylib provides version 0.9.7"

It seem like the prebuilt was created on a 10.6, and it does not work on vanilla 10.5 systems. Not a big deal, but is good to know.

Anonymous wrote on 2011-07-08 04:44:

Thanks for posting this. pypy is great. I'm trying to figure out how to write modules in RPython. I was sad that I missed the Baypiggies presentation.

René Dudfield wrote on 2011-07-08 07:35:

Hello,

it's lovely that pypy can do this. This result is amazing, wonderful, and is very kittens. pypy is fast at running python code (*happy dance*).

But.

It also makes kittens cry when you compare to CPython in such a way.

The reality is that CPython users would do this using a library like numpy, opencv, pygame, scipy, pyopengl, freej (the list of real time video processing python libraries is very large, so I won't list them all here).

Of course python can do this task well, and has for more than 10 years.

This code does not take advantage of vectorization through efficient SIMD, multiple cores or graphics hardware, and isn't careful with reusing memory - so is not within an order of magnitude of the speed of CPython code with libraries doing real time video processing.

Anyone within the field would ask about using these features.

Another question they would ask is about pauses. How does the JIT affect pauses in animation? What are the rules for when the JIT warms up, and how can you tell when the code will start running fast? How does the GC affect pauses? If there is a way to turn off the GC, or reuse memory in some way such that the GC won't cause the program to fail(Remember that in realtime a pause is a program fail). Does the GC pool memory of similar size objects automatically? Does the GC work well with 256MB-1GB-16GB sized objects? In a 16GB system, can you use 15GB of objects, and then delete those objects to then use another 15GB of different objects? Or will the program swap, or fragment memory causing pauses?

Please don't make kittens cry. Be realistic with CPython comparisons.

At the moment the python implementation is not as elegant as a vector style implementation. A numpy/matlab/CUDA/OpenCL approach looks really nice for this type of code. One speed up might be to reuse memory, or act in place where possible. For example, not copying the image... unless the GC magically takes care of that for you.

Jacob Hallén wrote on 2011-07-08 08:21:

@illume:More or less everyone knows that you can speed up your code by writing or using an extension library. Unfortunately this introduces a dependency on the library (for instance libssl mentioned in the comment thread) and it usually increases the complexity of your code.

Using PyPy you can solve computationally intensive problems in plain Python. Writing in Python saves development time. This is what the comparison is all about.

René Dudfield wrote on 2011-07-08 12:23:

hi @jacob: below is code which runs either multi core, vectorised SIMD, and on a GPU if you like. You'll notice that it is way shorter and more elegant than the 'pure python' code.

def sobelEdgeDetect(im=DImage, p=Position):
....wX = outerproduct([1,2,1],[-1,0,1])
....wY = transpose(wX)

....Gx = convolve(wX,im,p)
....Gy = convolve(wY,im,p)

....return sqrt(Gx**2 + Gy**2)

If pypy is 5x slower than C, and SIMD is 5x faster than C... and using multiple cores is 8x faster than a single core you can see this python code is (5 * 5 * 8) 200x faster than the pypy code. This is just comparing CPU based code. Obviously GPU code for real time image processing is very fast compared to CPU based code.

Things like numpy, pyopengl etc come packaged with various OSes - but chosing those dependencies compared to depending on pypy is a separate issue I guess (but many cpython packaged libraries are packaged for more platforms than pypy).

Of course using tested, and debugged existing code written in python will save you development time: for example using sobel written with the scipy library:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.filters.sobel.html

The fact is CPython is fast enough, more elegant, and will save you time for realtime image processing - unless you ignore the reality that people use CPython libraries for these tasks.

Finally the given code does not prove that the frames are all processed in realtime. They give an average time over all of the frames. Realtime video requires that you meet your target speed for every frame. It would need to be extended to measure each frame to make sure that each frame is within the required time budget.

Antonio Cuni wrote on 2011-07-08 12:31:

@illume: I think you completely missed the point of the blog post. This is not about "you should use pypy to do video processing", it's about "pypy runs pure python code very fast".

René Dudfield wrote on 2011-07-08 12:58:

@Antonio Cuni, I'm saying the post reads like cpython can not do "realtime image processing in python" and that pypy can.

tismer wrote on 2011-07-08 14:21:

@illume:
This example shows pure python code and compares its execution time in cpython and pypy. Nothing else. Writing graphics code in pure python that runs not dreadfully slow was to my knowledge never before shown.
If enough people understand the potential of this technique and put their time into it, we will hopefully come closer to your (5 * 5 * 8) acceleration in pypy, too.
I will for sure work on this.

Eventh wrote on 2011-07-08 14:41:

SIMD instructions and multi core support is something PyPy has potential to support, given time and funding.

Anonymous wrote on 2011-07-08 21:20:

The typical optimization path here would be implementing the necessary numpy array operations for the algorithms described. I wonder how a proper numpy implementation would compare.

Armin Rigo wrote on 2011-07-09 13:38:

I think you are still missing the point of the post. It was not "use pure Python to write your video processing algos". That's of course nonsense, given the amount and quality of existing C extension modules to do that.

The point is that when you want to experiment with writing a new algorithm of any kind, it is now possible to do it in pure Python instead of, say, C code. If later your project needs to move past the experimentation phase, you will have to decide if you want to keep that Python code, rewrite it in C, or (if applicable) use SIMD instructions from Python or from C, or whatever.

The real point of this demo is to show that PyPy makes Python fast enough as an early experimentation platform for almost any kind of algorithm. If you can write in Python instead of in C, you'll save 50% of your time (random estimate); and then for the 5% of projects that go past the experimentation phase and where Python is not enough (other random estimate), spend more time learning other techniques and using them. The result is still in your favor, and it's only going to be more so as PyPy continues to improve.

Yaacov wrote on 2011-10-18 23:31:

I was hoping to experiment with this amazing demo on my Windows-based computers. Any advice for how I would start making the required changes?

Jacob

Anonymous wrote on 2012-07-24 13:38:

dead links

Maciej Fijalkowski wrote on 2012-07-24 13:41:

Unfortunately the server died :( I'm not sure where exactly are packaged demos, but they can be run from:

https://foss.heptapod.net/pypy/extradoc/-/blob/branch/default/extradoc/talk/iwtc11/benchmarks/image

Unknown wrote on 2012-10-04 22:08:

The python code for this seems to be now here:
https://foss.heptapod.net/pypy/extradoc/-/blob/branch/default/talk/dls2012/demo

Unknown wrote on 2012-10-04 22:09:

The scripts can be found here:

https://foss.heptapod.net/pypy/extradoc/-/blob/branch/default/153804ce4fc3/talk/dls2012/demo

Global Interpreter Lock, or how to kill it

Armin Rigo

2011-06-29 17:50

46 comments

People that listened to my (Armin Rigo) lightning talk at EuroPython know that suddenly, we have a plan to remove the Global Interpreter Lock --- the infamous GIL, the thing in CPython that prevents multiple threads from actually running in your Python code in parallel.

That's not actually new, because Jython has been doing it all along. Jython works by very carefully adding locks to all the mutable built-in types, and by relying on the underlying Java platform to be efficient about them (so that the result is faster than, say, very carefully adding similar locks in CPython). By "very carefully", I mean really really carefully; for example, 'dict1.update(dict2)' needs to lock both dict1 and dict2, but if you do it naively, then a parallel 'dict2.update(dict1)' might cause a deadlock.

All of PyPy, CPython and IronPython have a GIL. But for PyPy we are considering a quite different approach than Jython's, based on Software Transactional Memory. This is a recent development in computer science, and it gives a nicer solution than locking. Here is a short introduction to it.

Say you want to atomically pop an item from 'list1' and append it to 'list2':

def f(list1, list2):
    x = list1.pop()
    list2.append(x)

This is not safe in multithreaded cases (even with the GIL). Say that you call f(l1, l2) in thread 1 and f(l2, l1) in thread 2. What you want is that it has no effect at all (x is moved from one list to the other, then back). But what can occur is that instead the top of the two lists are swapped, depending on timing issues.

One way to fix it is with a global lock:

def f(list1, list2):
    global_lock.acquire()
    x = list1.pop()
    list2.append(x)
    global_lock.release()

A finer way to fix it is with locks that come with the lists:

def f(list1, list2):
    acquire_all_locks(list1.lock, list2.lock)
    x = list1.pop()
    list2.append(x)
    release_all_locks(list1.lock, list2.lock)

The second solution is a model for Jython's, while the first is a model for CPython's. Indeed, in CPython's interpreter, we acquire the GIL, then we do one bytecode (or actually a number of them, like 100), then we release the GIL; and then we proceed to the next bunch of 100.

Software Transactional Memory (STM) gives a third solution:

def f(list1, list2):
    while True:
        t = transaction()
        x = list1.pop(t)
        list2.append(t, x)
        if t.commit():
            break

In this solution, we make a transaction object and use it in all reads and writes we do to the lists. There are actually several different models, but let's focus on one of them. During a transaction, we don't actually change the global memory at all. Instead, we use the thread-local transaction object. We store in it which objects we read from, which objects we write to, and what values we write. It is only when the transaction reaches its end that we attempt to "commit" it. Committing might fail if other commits have occurred in between, creating inconsistencies; in that case, the transaction aborts and must restart from the beginning.

In the same way as the previous two solutions are models for CPython and Jython, the STM solution looks like it could be a model for PyPy in the future. In such a PyPy, the interpreter would start a transaction, do one or several bytecodes, and then end the transaction; and repeat. This is very similar to what is going on in CPython with the GIL. In particular, it means that it gives programmers all the same guarantees as the GIL does. The only difference is that it can actually run multiple threads in parallel, as long as their code does not interfere with each other. (In particular, if you need not just the GIL but actual locks in your existing multi-threaded program, then this will not magically remove the need for them. You might get an additional built-in module that exposes STM to your Python programs, if you prefer it over locks, but that's another question.)

Why not apply that idea to CPython? Because we would need to change everything everywhere. In the example above, you may have noted that I no longer call 'list1.pop()', but 'list1.pop(t)'; this is a way to tell that the implementation of all the methods needs to be changed, in order to do their work "transactionally". This means that instead of really changing the global memory in which the list is stored, it must instead record the change in the transation object. If our interpreter is written in C, as CPython is, then we need to write it explicitly everywhere. If it is written instead in a higher-level language, as PyPy is, then we can add this behavior as as set of translation rules, and apply them automatically wherever it is necessary. Moreover, it can be a translation-time option: you can either get the current "pypy" with a GIL, or a version with STM, which would be slower due to the extra bookkeeping. (How much slower? I have no clue, but as a wild guess, maybe between 2 and 5 times slower. That is fine if you have enough cores, as long as it scales nicely :-)

A final note: as STM research is very recent (it started around 2003), there are a number of variants around, and it's not clear yet which one is better in which cases. As far as I can tell, the approach described in "A Comprehensive Strategy for Contention Management in Software Transactional Memory" seems to be one possible state-of-the-art; it also seems to be "good enough for all cases".

So, when will it be done? I cannot say yet. It is still at the idea stage, but I think that it can work. How long would it take us to write it? Again no clue, but we are looking at many months rather than many days. This is the sort of thing that I would like to be able to work on full time after the Eurostars funding runs out on September 1. We are currently looking at ways to use crowdfunding to raise money so that I can do exactly that. Expect a blog post about that very soon. But this looks like a perfect candidate for crowdfunding -- there are at least thousands of you who would be willing to pay 10s of Euros to Kill the GIL. Now we only have to make this happen.

Michael Foord wrote on 2011-06-29 17:54:

If you concurrently run two transactions that interfere with each other - and they both restart on failure - isn't there a possibility that neither would ever complete? How would you mitigate against that? (Fallback to a global lock after a certain number of transaction failures perhaps?)

Anonymous wrote on 2011-06-29 18:13:

There's a thing that is not clear to me: how do you detect failures during commits?

jdhardy wrote on 2011-06-29 18:16:

IronPython doesn't have a GIL - it's the same as Jython.

Michael Foord wrote on 2011-06-29 18:17:

Plus transactions have to be scoped around code that is side-effect free (or you can guarantee containing the side-effects within the transaction). Why STM research was done in Haskell I guess. Anyway, it sounds like a hard problem. That's why Armin is interested I guess... :-)

Antonio Cuni wrote on 2011-06-29 18:23:

@michael: if two transactions conflict, you rollback only one of those, and from the external the effect is the same as having one locked by the GIL

About side effects: the plan is to close a transaction before a side effect operation and reopen a new one after it: this is what happens already with the GIL, which is released e.g. before I/O calls.

At least, this is how I understand it, and since I'm not Armin I might be wrong :-)

Michael Foord wrote on 2011-06-29 18:26:

@antonio
Ah, that makes sense. Thanks. :-)

Anonymous wrote on 2011-06-29 18:30:

This sounds like a great idea...

What happens when transaction interleaves together and fail? Both threads will still continue trying so to me this appears to be somewhat as efficient as locks. (Note I know nothing in this topic and would definitely like to learn more).

Sebastian Noack wrote on 2011-06-29 19:14:

I don't think that the primary reason STM is slower than the GIL, is the extra bookkeeping, but the fact that things need to be repeated. However, I could imagine, that STM still might yield better response times than acquiring locks, in some cases.

Tuomas Jorma Juhani Räsänen wrote on 2011-06-29 20:27:

STM is not ot that "recent" though:

Nir Shavit and Dan Touitou. Software transactional memory. In PODC '95: Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, pages 204-213, New York, NY, USA, 1995. ACM.

xyproto wrote on 2011-06-29 20:34:

I can imagine the reason this is efficient is because code often work on different parts of memory in different threads.

ChrisW wrote on 2011-06-29 22:17:

Hmm, ZODB has this kind of optimistic transaction committing, it results in having to deal with ConflictErrors and slowness from retrying requests when they conflict amongst other pain. If that's the price for losing the GIL, I'll stick with the GIL, thanks...

gertjan wrote on 2011-06-29 22:48:

Well when it comes to removing the GIL I have always had my hopes on pypy, and I'd be very happy to contribute some coin to make it happen. I'll be looking out for that crowdfunding post.

Zemantic dreams wrote on 2011-06-29 23:00:

Ok, so where can we give a small contribution?

Andraz Tori, Zemanta

Richard wrote on 2011-06-30 00:32:

Have you read about Microsoft's abandoned attempt to bring STM to .NET? Have you considered the problems they had?

Jon Morgan wrote on 2011-06-30 05:56:

Interesting idea, but some questions:
1. What do C extensions do? (extensions designed for CPython that are using GIL methods). Would they still be able to be used, or would they have to be rewritten for PyPy?

2. What happens if repeatable operations are interleaved with operations that are not repeatable? (e.g. logging values to a file - we wouldn't want it to happen twice if there was a conflict, unless of course you are using that logging to trace what is happening...).

Ben wrote on 2011-06-30 10:30:

@Michael Foord: In state-of-the-art lazy[1] STM systems, the probability of two transactions continually causing each other to restart is minuscule. A transaction only causes another one to restart when it tries to commit. So when somebody restarts, it means that someone else has successfully committed.

[1] In "Lazy" STMs, transactions only get exclusive access to the things they're trying to write to for a very short window of time at the end. This means they have to record writes in a transaction log, as Armin described, because there might be many pending writes for the same object. An alternative design is "eager" STM, where transactions write directly and have to "undo" their writes if they get aborted. Eager systems look good on paper, but in my opinion they're not worth it. With eager STM, the runtime system has to be very carefully designed to avoid livelock (when the system hangs because some transactions constantly abort each other). Lazy STM is almost impossible to livelock in practice, because even if some transactions are highly conflicting at least one of them (almost always) has to commit.

Ben wrote on 2011-06-30 10:52:

Also, my honours project was implementing most of an STM system, and I've been a long time fan of (and sometime tinkerer with) PyPy, so I would be very interested in where this goes.

And I know this is extremely premature, but if there were enough money coming in for this project and the PyPy team were willing to include outside developers, I would absolutely love to put serious work into this.

Armin Rigo wrote on 2011-06-30 11:28:

@Richard: reading the web page you point out, Microsoft's STM attempt (like most others I'm aware of) seems to work at a different level: basically as a library for application programmers. I can go through all 4 points and show why they are not relevant in our context:

* any visible I/O (e.g. writing to a file or a log) is going to end the transaction and start the next one, just like the GIL is released and re-acquired around most calls to the C library's write() function

* the 2nd issue is moot, because STM will be an internal detail in PyPy, not a user-visible feature

* the 3nd issue he describes is about "update-in-place" STM, which I believe is not the best solution: we want instead to keep a local log of the changes, and apply them only at commit-time (as described e.g. in the paper I pointed out)

* the final issue is the lack of real successes with STM. Well, we can't do anything about that ahead of time :-)

Anonymous wrote on 2011-06-30 11:29:

One note on the lock-based example you gave, that locks list1 and then list2: It isn't free of deadlocks!

Having two threads call the function simultaneously with swapped args may cause a deadlock. See the bank account problem.

Armin Rigo wrote on 2011-06-30 11:49:

@Anonymous: yes, I know it can deadlock. I have hidden the problem into some theoretical function acquire_all_locks(), which should somehow make sure that all locks are atomically acquired, in any order (which I think is possible by first sorting the locks according to their address in memory). I didn't want to put too much emphasis on the negative side of locks :-)

Armin Rigo wrote on 2011-06-30 11:51:

@Jon Morgan:

1. We would most probably still
have a GIL for the CPython C
extensions. Only one can run at a
time, but any number of PyPy
threads can run at the same time.
(This is because the CPython C
extensions never access PyPy's own
objects directly --- they cannot,
because PyPy's own objects can
move, and the C code is not
prepared for that.)

2. Logging to a file is done with a
call to a function like write().
In CPython and so far in PyPy, the
call to write() is preceded by
"release GIL" and followed by
"re-acquire GIL". In the STM PyPy,
it would be preceded by "end the
current transaction" and "start the
next transaction". This gives the
same behavior. But we may have to
think a bit harder about writes
that are buffered, because it seems
that if all threads write into the
same buffer then it will cause many
transaction conflicts.

Note however that we are talking
here about very short-lived
transactions. Even if you have 20
threads all writing to the same log
file, each thread is going to run
much more than 20 bytecodes between
any two writes to the log file.
You only get conflicts if two of
these threads are running the
write() call at the same time, and
such a conflict only causes one of
the threads to roll back and retry
the write(), not more.

Armin Rigo wrote on 2011-06-30 11:54:

@tuomasjjrasanen: yes, actually the first paper is from the 80's. But I think that it's only from around 2003 or 2004 that research seriously started, in the sense that papers were produced regularly, from several teams.

Kevin Granade wrote on 2011-06-30 14:47:

To address the anonymous question near the start of the comments, one way to detect commit collision is to copy a global generation counter at the start of your transaction, and then compare your stored copy to the current generation counter at commit time (after taking a lock), and if no one else has incremented the generation counter, you do so and complete your operation.

So transaction does:
self.generation = global.generation

And commit does:
if lock(global.lock):
if self.generation == global.generation:
global.generation += 1
return True
unlock(global.lock)
return False

Jan Ziak (atomsymbol) wrote on 2011-06-30 16:47:

I am not sure what to make out of the solution (=STM) to GIL you proposed in the article. You are essentially suggesting to slow down all Python programs in PyPy by a factor of, say, 4 and hope to recover the loss for a very small percentage of programs on an 8-core machine.

That can't be right. Please tell me I am dreaming ... :)

Michael Foord wrote on 2011-06-30 19:29:

So if there is only one thread transactions will be disabled?

I wonder how "fine grained" transactions will be: if you have parallel operations working concurrently on a large array do you think you will be able to allow threads to simultaneously modify different areas of the array?

Ben wrote on 2011-06-30 21:22:

@⚛: That's kind of how parallelization goes. There are overheads, and the only way to make up for them is to hope you have enough parallel speedup. STM (and any approach to this problem based on fine-grained locking) would work best if only a small known set of objects are shared between threads, and only those are synchronized, which unfortunately cannot be the case for a general GIL-removal proposal.

However I think PyPy's JIT could potentially help a little here. The escape analysis PyPy already does can also prove "this value cannot be accessed by another thread" and used to avoid logging some values, since they cannot conflict with parallel transactions. There are probably some more STM-specific optimizations the JIT could do as well.

Ben wrote on 2011-06-30 21:27:

@Michael Foord: STM definitely can be made as fine-grained as you like. Some existing STM systems operate at the level of machine words. Given that this one will be operating at the interpreter level, I would guess that code working on different sections of the same object (or array) would able to run in parallel, but I guess it depends on how the tradeoffs play out.

Armin Rigo wrote on 2011-06-30 22:12:

@⚛: to complete Ben's answer: yes, you are correct, but that's why the translation step "insert STM logic" is never going to be mandatory. You will get either a regular pypy-c-gil or a pypy-c-stm, as two different executables, and you will choose the one most suited for your particular program. I still expect pypy-c-gil to be the most used one, with pypy-c-stm an alternative that is only useful for people with massively multi-threaded programs.

EmilK wrote on 2011-07-01 10:55:

It would be cool, if the python programmer could mark "uncritical" sections, such that the stm book keeping is disabled for those sections where the programmer knows that there is no concurrency.

Jacob Hallén wrote on 2011-07-01 14:17:

@EmilK: I think that would be very uncool. You would allow the developer to introduce bugs that would be extremely hard to locate. Parallel programs are quite difficult to get right to start with, and anyone who does not have complete understanding of what constitutes a critical section will be very likely to make an error.

Skandalfo wrote on 2011-07-02 20:18:

There's an intermediate option between the GIL and the careful locking done by Jython, that I had a look at some time ago for making Python more thread friendly.

Just exchanging the GIL for a global readers-writer lock would allow Python to use way more concurrency. You would run all Python code under a reader lock for operations that were read-only on objects. For modifying built in mutable objects, or for things like the one involving both lists in the Jython example, or when calling into C modules, you would have to acquire the writer version of the lock.

Python threads would relinquish the reader lock each N opcodes, just like it's done now for the GIL, and I guess the acquisition of the writer lock should be given priority over the reader ones.

This approach should be simpler to implement than using the transactional memory approach, and it should be possible to bake it into CPython too. I think I remember having read some discussion about this somewhere, but it didn't seem to come to anything...

Armin Rigo wrote on 2011-07-06 14:26:

@Skandalfo: this cannot work with CPython, because of reference counting -- every bytecode modifies reference counts, so needs the "write" lock. But it could be a possible idea to consider in PyPy.

WhiteLynx wrote on 2011-07-06 19:42:

I love this idea.

Just musing on an implementation detail here, but isn't the "lazy" STM implementation's transaction system effectively just an in-memory implementation of copy-on-write semantics? It might be interesting to take a look at other things that have used COW for inspiration. (ZFS and btrfs come to mind) I like the idea that committing a transaction for a given object would just involve changing the object's address in memory to the modified copy.

Also, I'd be interested to see the read/write lock system get implemented, because it seems like it might be a better choice for programs that only use a couple threads.

Anonymous wrote on 2011-07-06 21:30:

What is wrong with Jython's lock model? Java is a pretty efficient language, no? And there is also no need to acquire locks for objects that you can prove won't cause conflicts...

Skandalfo wrote on 2011-07-06 21:47:

@Armin Rigo: If the problem for the RW-lock approach in CPython is just about reference count updates and checks, perhaps those could be done via atomic primitives, as supported on most modern architectures. This is what boost::shared_ptr does, IIRC, for the pointers to be thread-safe by default.

Armin Rigo wrote on 2011-07-09 13:18:

@Skandalfo: right, indeed. I don't know exactly the cost of such atomic operations. Maybe it's fine, but I fear that doing tons of increfs/decrefs all the time (as needed for refcounts in CPython's simple interpreter) has an important cost.

Tuure Laurinolli wrote on 2011-07-11 20:10:

@Armin Rigo

You'd need similar atomic instructions for an STM implementation too - although perhaps not as many? In any case they should be about as cheap as L1 cache writes unless there's contention, but then things are going to be slow in any case if you have contention. Of course you might have false sharing of objects etc. to muddle things up.

In any case, what sort of semantics would a GIL-free Python have in multi-threaded case, compared to current GIL-infested Python? Each opcode can assumed to execute atomically?

Anonymous wrote on 2011-07-17 12:32:

One thread have one interpreter.
Threads interactive like os native thread, use the os interactive method wrap by py.

I want to embed multi interpreter in my c code!

Please kill GIL!!!

Raymin wrote on 2011-07-17 12:48:

One thread have one interpreter.
Threads interactive like os native thread, use the os interactive method wrap by py.

I want to embed multi interpreter in my c code!

Please kill GIL!!!

Armin Rigo wrote on 2011-07-24 13:07:

@Tuure Laurinolli: yes, but PyPy has no refcounts. I was just discussing the pro/cons of the proposed locking solution on CPython (which is off-topic as far as this original blog post is concerned). I don't even want to think about STM for CPython :-)

For your second question, from the user's point of view, the semantics we would get with STM are automatically the same as with the GIL, which is why I like the approach.

Anonymous wrote on 2011-07-29 14:08:

Also, what about the performance if the lazy commit method used in the post? Every transaction will create additional memory? Is that really efficient, IMHO this model is aiming a very small number of use cases??

klaussfreire wrote on 2011-10-14 21:26:

I can see a use for STM in CPython, too, though. Even though it seems to be not applicable, it need not be true.

I worked on making the reference counting thread-friendly, in the sense that when you have multiple threads reading a big data structure, CPython's reference counting turns all the reads into writes, which is awful for performance.

I wrote a patch to pack all writes in the same memory page (ie, reference pools, external reference counting), and was working on a patch for STM reference count updates.

The thing with STM and reference counting, is that many operations cancel out at the end of the transaction. Like when you just read objects while performing computations, you acquire a reference, work, then release it.

In the end, STM here would remove the need to write to shared memory.

In the process of working on that patch, I can tell CPython can be made to use STM techniques. You have thread-local storage at the VM level already, macros handle almost all reference counting operations, it's all abstracted enough that it might be possible.

For reference counting, the only problem is that STM is way slower for single threaded applications. WAY slower. For multithreaded, it pays off considerably, but CPython guys are very strongly set in favouring single-threaded performance.

halfaleague wrote on 2011-10-28 03:55:

How can we fund this?

Maciej Fijalkowski wrote on 2011-10-28 07:31:

@halfaleague get in contact. pypy@sfconservancy.org is the right address for non-profit funding inquires.

Daniel Waterworth wrote on 2011-12-11 07:40:

I managed to write a Haskell STM implementation in a single morning. It may not be the most efficient implementation (I've found it to be about half the speed of the GHC implementation in the limited testing I've done), but it's really simple and only uses atomic CAS.

https://gist.github.com/1454995

shawn wrote on 2011-12-31 20:38:

have you looked at all at "Worlds" as a simpler interface to STM?

https://www.vpri.org/pdf/tr2011001_final_worlds.pdf

Report back from our survey

Alex

2011-06-08 06:18

11 comments

Hi all,

I'm here to report back the results of our survey. First, we're very pleased to report that a number of you guys are happilly running PyPy in production! Most (97%) of the respondants using PyPy are using it because it's faster, but a further 26% (respondants could choose multiple answers) are using it because of lower memory usage. Of users who aren't using PyPy, the most common reason was C extensions, followed by "Other".

From reading the extra comments section there are a few things we've learned:

Google docs needs a better UI for this stuff
A huge number of people want NumPy and SciPy, it was easily the most requested C extension (25% of respondants said somthing about NumPy). We've already blogged on the topic of our plans for NumPy.
Having packages in the various OS's repositories would be a big help in getting users up and running.

A huge thanks to everyone who responded! Finally, if you're using PyPy in production we'd love to get a testimonial from you, if you're willing to spare a few minutes to give us a quote or two please get in contact with us via our mailing list.

Thanks, Alex

Paul wrote on 2011-06-08 10:18:

I'm surprised more people didn't mention Python 3 support as a big breaker. I certainly did.

Jan Ziak (atomsymbol) wrote on 2011-06-08 14:16:

"... we're very pleased to report that a number of you guys are happilly running PyPy in production"

You decided to keep the actual number of users a secret? Why?

Maciej Fijalkowski wrote on 2011-06-08 14:20:

@⚛ I think Alex was simply too lazy to count :-) At some point there were 600 respondents and roughly 10% of them used pypy in production, which is pretty good IMO.

Jan Ziak (atomsymbol) wrote on 2011-06-08 18:05:

@Maciej Fijalkowski: Ok, thanks for the clarification.

Marko Tasic wrote on 2011-06-08 20:42:

I'm using pypy 1.5 with jit in production for highly reliable and responsive distributed and decentralized systems, and I'm happy with it.

Jan Ziak (atomsymbol) wrote on 2011-06-09 07:22:

@Marko Tasic: If I may ask a question. You wrote that you are using PyPy for highly reliable systems. I know what you mean, but it seems to me that certain features of Python are in contradiction with high reliability. For example, it is in practice impossible to know at compile-time whether you misspelled a variable or parameter in Python source code. My question would be: why are you using a language which has only rudimentary compile-time error detection to implement a high reliability system?

Maciej Fijalkowski wrote on 2011-06-09 07:58:

@⚛ Not even trying to argue with you, comments on this blog is not a proper place to discuss whether Python is good for high-reliability systems. Please take the discussion somewhere else

Thanks,
fijal

Jan Ziak (atomsymbol) wrote on 2011-06-09 09:38:

@Maciej Fijalkowski: I will of course do what you ask, but I would like you to point me to at least one blog comment that: (1) Is initially saying that Python/PyPy is *good* for task X, and (2) You or somebody else from the PyPy team wrote "Please take the discussion about X somewhere else".

Thanks
⚛

Maciej Fijalkowski wrote on 2011-06-09 09:41:

@⚛ The line might be blurry, but "I'm using PyPy for X" or "I'm not using PyPy for X, because ..." is on topic. While "Python can be used for X" or "Python can't be used for X, because ..." is not on topic. This is a fine line between language implementation (which is PyPy about) and language design (which PyPy is not about, python-dev/python-list/python-ideas mailing lists are about that).

Cheers,
fijal

Anonymous wrote on 2011-06-11 01:06:

What about a FFI to C or C++? Something like LuaJit's FFI, which is really good.

Anonymous wrote on 2011-06-15 10:10:

Lack of support for numpy and scipy are what keep me from using pypy. Am using python for analysis of ultra high throughput DNA sequencing data.

Would be very curious to see how much performance I could gain by using pypy.

PyPy Genova-Pegli Post-EuroPython Sprint June 27 - July 2 2011

Antonio Cuni

2011-05-23 20:45

2 comments

The next PyPy sprint will be in Genova-Pegli, Italy, the week after EuroPython (which is in Florence, about 3h away by train). This is a fully public sprint: newcomers and topics other than those proposed below are welcome.

Goals and topics of the sprint

Now that we have released 1.5, the sprint itself is going to be mainly working on fixing issues reported by various users. Possible topics include, but are not limited to:
- fixing issues in the bug tracker
- improve cpyext, the C-API compatibility layer, to support more extension modules
- finish/improve/merge jitypes2, the branch which makes ctypes JIT friendly
- general JIT improvements
- improve our tools, like the jitviewer or the buildbot infrastructure
- make your favorite module/application working on PyPy, if it doesn't yet
Of course this does not prevent people from showing up with a more precise interest in mind If there are newcomers, we will gladly give introduction talks.
Since we are almost on the beach, we can take one day off for summer relaxation and/or tourist visits nearby :-).

Exact times

The work days should be 27 June - 2 July 2011. People may arrive on the 26th already and/or leave on the 3rd.

Location & Accomodation

Both the sprint venue and the lodging will be at Albergo Puppo in Genova-Pegli, Italy. Pegli is a nice and peaceful little quarter of Genova, and the hotel is directly on the beach, making it a perfect place for those who want to enjoy the sea in the middle of the Italian summer, as a quick search on Google Images shows :-)

The place has a good ADSL Internet connexion with wireless installed. You can of course arrange your own lodging anywhere but I definitely recommend lodging there too.
Please confirm that you are coming so that we can adjust the reservations as appropriate. The prices are as follows, and they include breakfast and a parking place for the car, in case you need it:

single room: 70 €

double room: 95 €

triple room: 105 €

Please register by hg:

https://foss.heptapod.net/pypy/extradoc/-/blob/branch/default/extradoc/sprintinfo/genova-pegli-2011/people.txt

or on the pypy-dev mailing list if you do not yet have check-in rights:

https://mail.python.org/mailman/listinfo/pypy-dev

In case you want to share a room with someone else but you don't know who, please let us know (either by writing it directly in people.txt or by writing on the mailing list) and we will try to arrange it.

vak wrote on 2011-05-25 11:39:

Hi,

as for upcoming sprint...

The grid on https://speed.pypy.org/timeline/ is a totally great idea. However the benchmark tests listed represent no progress since a long time already.

Q1. Does it mean that the set is not representative any more and should be extended?

Q2. Is it possible to include some micro benchmarks, please? (Oh, please!)

vak wrote on 2011-06-14 14:31:

no answers -- it's a pity

PyPy Usage Survey

Alex

2011-05-16 17:27

24 comments

We've been working on PyPy for a long time. But readers of this blog will know that in the past year something has changed: we think PyPy is production ready. And it's not just us, this week LWN.net wrote an article about how PyPy sped up one of their scripts by a factor of three, noting that, "plans are to run gitdm under PyPy from here on out". All in all we think PyPy is pretty great, but not everyone is using it yet, and we want to know why. We want your feedback on why PyPy isn't ready to be your only Python yet, and how we can improve it to make that happen.

Therefore, we've put together a quick survey, whether you're using PyPy or not if you could take a few minutes to fill it out and let us know how we're doing we'd really appreciate it. You can find the form here.

Thanks, The PyPy team

Anonymous wrote on 2011-05-16 18:23:

We are very interested in using PyPy in production but our project is based on lxml library and both are incompatible. Do you suggest any fix for this? I'm not sure if PyPy would compensate the reduction if performance of a pure Python XML library.

Anonymous wrote on 2011-05-16 18:55:

Biggest blocker right now is gevent, which I believe would require pypy stackless and JIT to get along plus some work to make gevent use ctypes in place of cpython api.

Anonymous wrote on 2011-05-16 19:12:

I suggest that you reproduce this survey on StackOverflow (if it's acceptable there, maybe Programmers?) and Quora, maybe Convore too. Posting to comp.lang.python would also help.

Anonymous wrote on 2011-05-16 19:22:

Pypy needs to either be a dropin replacement for python or provide a significant (order of magnitude) difference in performance that moving to pypy won't be as big of a deal when you lose the support of so many 3rd party libraries.

Anonymous wrote on 2011-05-16 19:35:

1. Installation is long and non-intuitive. I'd like to see PyPy packaged up for all the major distros + Mac OSX via Fink, Homebrew, and MacPorts.

2. A comprehensive listing of modules that can and cannot be used with PyPy. I'm still not quite clear as to how PyPy interacts with the major web frameworks and WSGI (haven't researched it much either).

3. Drop-in replacement for Python 2.7. I want my scripts that I wrote in Python to run in PyPy with no complications.

Pavel wrote on 2011-05-16 19:46:

Could you provide the downloads with PGP signatures, please? We would like to use PyPy in production to run our payment processing system backend, but verified integrity and authenticity of its source code is strictly required.

Victor wrote on 2011-05-16 20:05:

2. A comprehensive listing of modules that can and cannot be used with PyPy. I'm still not quite clear as to how PyPy interacts with the major web frameworks and WSGI (haven't researched it much either).

This is available at the PyPy Compatibility Wiki (I should update it this week, lots of new information around).

Anonymous wrote on 2011-05-16 20:20:

We would use it across all our deployments (hundreds of thousands of LOCs) and gladly contribute and invest in pypy as soon as you guys implement python3 spec. Literally can't wait.

Daniel Kluev wrote on 2011-05-17 06:52:

I'd love to use PyPy in some of my projects, but they rely on lots of 3rd-party C/C++-based libraries.

1) lxml, thats an absolute must for most of my applications. Original ETree now lacks many features lxml has, so there is no ready pure-python replacement avail.
2) Some my own boost::python libraries. I didn't actually try to compile them on PyPy, but as I was told on IRC, support for b::p is still marginal.
3) PycURL, PyV8, PyQt, wxPython and so on.

Martin Gfeller wrote on 2011-05-17 09:14:

We would like to profit from the speedup, but it would be a major piece of work for us, as we're currently running Zope 2.13 (which we could replace, because we make only limited use of it and have our own hybrid database). However, before making an investment, we need to be sure that:

- PyPy won't go away like Psyco did. A kind of "mainstream endorsement" by PSF would be helpful

- numpy and scipy are available

- a decent ODBC package is available (we're using mxODBC) at the moment

- full support on Windows 32 and 64 bit

Best regards, Martin

Swisscom IT Services Finance

Maciej Fijalkowski wrote on 2011-05-17 09:18:

@martin

* numpy, scipy support is on the way

* 32bit windows is done, 64bit windows will happen, it's on the todo list

* PSF has just endorsed PyPy in front of 1000 people crowd on pycon giving us a 10000$ check (https://3.bp.blogspot.com/-yLUKuyRgjdg/TYfklB5Jg4I/AAAAAAAABKM/_5Rv2thqzA0/s1600/pycon_cheque.jpg).

That answers roughly half to 3/4 of your issues, no bad, we're getting there :)

Anonymous wrote on 2011-05-17 15:48:

I would like to repeat the numpy and scipy thing. I have to add matplotlib, which a lot of people use for plotting. Personally I also cannot live without h5py, which is awesome for storing and handling numerical data. I have no idea if it will work with pypy, because it does require numpy first.

I'm looking forward to pypy becoming faster, better supported, and more popular! I am convinced that it will.

wilk wrote on 2011-05-17 16:38:

I've a project wich use psyco with a factor 15 (computation of train path) ! yes really, this project is in production (unfortunately not open source) ! I just tried it with pypy 1.5, and it works with the same factor (congratulation to you). So i'm sure that we'll use pypy.

But like my other project, i don't change something wich already works. Most of them don't need speed improvement.

On one scrabble game i'd like to replace a scrabble solver in C (if someone wants to help, it's opensource ?)

I also hope to see a debian package in the next debian release...

Thanks for your work, i follow it !

Anonymous wrote on 2011-05-18 13:26:

On my server I'm running couple of Django based ecommerce systems. I hope to be running more of them soon (hopefully). There is also PostgreSQL. Still not using PyPy but I just can't wait to check if it will be faster and if so then how much. I don't know yet how to run Django app on production on PyPy but as soon I check and run couple of performance tests I will surely give some feedback.

raptor wrote on 2011-05-23 00:53:

Its all about compatibility with 3rd party libs, C libs or boost::python. Otherwise those who want to JIT their Python are just going to wait a bit longer for PEP 3146 so they can have a good LLVM based JIT in standard Python.

https://www.python.org/dev/peps/pep-3146/

Anonymous wrote on 2011-05-23 03:48:

The pypy group should make a full featured ide with a gui designer with built in packaging to .exe and linux .deb and .rpm that only runs the pypy vm. That would bring the desktop application programmers in by the droves.

Carl Friedrich Bolz-Tereick wrote on 2011-05-23 07:35:

@Hart: unladen swallow is dead:

https://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html

Anonymous wrote on 2011-05-23 15:21:

Well, basically, it's NumPy, SciPy, Matplotlib and MayaVi. I'm also using Cython to optimize computation intensive code paths, but of course it would be nicer to stick to pure Python and let JIT do it's magic.

lazyweb wrote on 2011-05-23 18:44:

Arrgh, gevent does not work with pypy? There's my blocker.

Gaëtan de Menten wrote on 2011-05-30 12:59:

How long are you planning to keep this poll open? I hope you will blog about its results when it's closed...

Almir Karic wrote on 2011-06-02 02:49:

would love to see the results

Anonymous wrote on 2011-06-02 15:21:

I'm interesting in psycopg2and PIL libraries.

Caetano wrote on 2011-06-02 15:29:

The only thing that makes me not using pypy is the lack of supporting python bynaries .so, .pyd, etc.
I know that is a hard feature to implement because is needed to stub the CPython api.
but I think when its done will there is no reasons to not using pypy for anybody.

Anonymous wrote on 2011-08-04 21:21:

Numpy, scipy, matplotlib, and image are the stick ups for me.

Server migration in progress

Armin Rigo

2011-05-15 18:30

1 comment

Hi all,

We are in the process of migrating the hosting machine for PyPy, moving away from codespeak.net and towards a mixture of custom servers (e.g. for buildbot.pypy.org) and wide-scale services (e.g. for the docs, now at readthedocs.org).

When this is done, a proper announce will be posted here. In the meantime, we have already moved the mailing lists, now hosted on python.org. The subscribers' list have been copied, so if you didn't notice anything special for the past week, then everything works fine :-) This concerns pypy-dev, pypy-issue and pypy-commit. Two notes:

Some settings have not been copied, notably if you used to disable mail delivery. Sorry about that; you have to re-enter such settings.
Following the move, about 50 addresses have been dropped for being invalid. I'm unsure why they were not dropped earlier, but in case sending mail to you from python.org instead of codespeak.net fails, then you have been dropped from the mailing lists, and you need to subscribe again.

Henrik Vendelbo wrote on 2011-05-17 16:15:

I enjoy PyPy a lot, and would use it for production.

However I tend to have a lot of problems when I upgrade to the latest source as my PyPy modules/extensions break and I will have to reimplement them with the new internal APIs.

It would be great if there was a bit more stability around the structure of main and how to write a module.

Playing with Linear Programming on PyPy

Hakan Ardo

2011-05-11 12:27

8 comments

Fancy hi-level interfaces often come with a high runtime overhead making them slow. Here is an experiment with building such an interface using constructions that PyPy should be good at optimizing. The idea is to allow the JIT in PyPy to remove the overhead introduced by using a fancy high-level python interface on top of a low-level C interface. The application considered is Linear programming. It is a tool used to solve linear optimization problems. It can for example be used to find the nonnegative values x, y and z that gives the maximum value of

without violating the constraints

There exists general purpose solvers for these kind of problems that are very fast and can literally handle millions of variables. To use them however the problem has to be transformed into some specific matrix form, and the coefficients of all the matrices has to be passed to the solver using some API. This transformation is a tedious and error prone step that forces you to work with matrix indexes instead of readable variable names. Also it makes maintaining an implementation hard since any modification has to be transformed too.

The example above comes from the manual of the glpk library. That manual continues by describing how to convert this problem into the standard form of glpk (which involves introducing three new variables) and then gives the c-code needed to call the library. Relating that c-code to the problem above without the intermediate explanation of the manual is not easy. A common solution here is to build a hi-level interface that allows a more natural way of defining the matrices and/or allow the equations to be entered symbolically. Unfortunately, such interfaces often become slow. For the benchmark below for example, cvxopt requires 20 minutes to setup a problem that takes 9.43 seconds to solve (this seems a bit extreme, am I doing something wrong?).

The high-level interface I constructed on top of the glpk library is pplp and it allows the equations to be entered symbolically. The above problem can be solved using

    lp = LinearProgram()
    x, y, z = lp.IntVar(), lp.IntVar(), lp.IntVar()
    lp.objective = 10*x + 6*y + 4*z
    lp.add_constraint( x + y + z <= 100 )
    lp.add_constraint( 10*x + 4*y + 5*z <= 600 )
    lp.add_constraint( 2*x + 2*y + 6*z <= 300 )
    lp.add_constraint( x >= 0 )
    lp.add_constraint( y >= 0 )
    lp.add_constraint( z >= 0 )

    maxval = lp.maximize()
    print maxval
    print x.value, y.value, z.value

To benchmark the API I used it to solve a minimum-cost flow problem with 154072 nodes and 390334 arcs. The C library needs 9.43 s to solve this and the pplp interface adds another 5.89 s under PyPy and 28.17 s under CPython. A large amount of time is still spend setting up the problem, but it's a significant improvement over the 20 minutes required on CPython by cvxopt. It is probably not designed to be fast on this kind of benchmark. I have not been able to get cvxopt to work under PyPy. The benchmark used is available here

The Cannon Family wrote on 2011-05-11 23:27:

for the first equation do you not perhaps mean f(x,y,z) = 10x+6y+4z instead of z = 10x+6y+4z ?

Hakan Ardo wrote on 2011-05-12 07:29:

Yes, there is a typo there, I'll update the post. Thanx for noting.

Winston Ewert wrote on 2011-05-12 14:28:

That seems like a lot of overhead for the wrapper, what is up with that? I mean, I'd expect the wrapper to reasonably quickly pass it off to the C library.

Anonymous wrote on 2011-05-12 16:48:

you should try www.solverfoundation.com using ironpython too.

Hakan Ardo wrote on 2011-05-12 18:53:

Winston: It is indeed. What cvxopt spends 20 min on I don't know. One guess would be that it is passing the ~2 million coefficients involved to C one by one, possible with a bit of error checking for each of them. As for the 6 s used by pplp, it needs to convert the equations into the matrices glpk wants. That means shuffling the coefficients around a bit and some bookkeeping to keep track of which goes where.

Anonymous: OK, how would the above example look in that case?

Hakan Ardo wrote on 2011-05-14 12:24:

Thanx for noting, I've fixed the post (again).

Unknown wrote on 2011-05-30 18:48:

have you tried openopt[1]?

[1] openopt.org

Joachim Dahl wrote on 2011-08-05 09:37:

Are you distinguishing between the time it takes to setup the optimization problem and the time it takes to actually solve it?

GLPK is a simplex solver written in C, and CVXOPT is an interior point solver written in Python/C and is not particularly optimized for sparse problem. Nevertheless, you should check the you actually formulate a large sparse problem in CVXOPT, and not a dense one.

NumPy Follow up

Alex

2011-05-05 21:56

19 comments

Hi everyone. Since yesterday's blog post we got a ton of feedback, so we want to clarify a few things, as well as share some of the progress we've made, in only the 24 hours since the post.

Reusing the original NumPy

First, a lot of people have asked why we cannot just reuse the original NumPy through cpyext, our CPython C-API compatibility layer. We believe this is not the best approach, for a few reasons:

cpyext is slow, and always will be slow. It has to emulate far too many details of the CPython object model that don't exist on PyPy (e.g., reference counting). Since people are using NumPy primarily for speed this would mean that even if we could have a working NumPy, no one would want to use it. Also, as soon as the execution crosses the cpyext boundary, it becomes invisible to the JIT, which means the JIT has to assume the worst and deoptimize stuff away.

NumPy uses many obscure documented and undocumented details of the CPython C-API. Emulating these is often difficult or impossible (e.g. we can't fix accessing a struct field, as there's no function call for us to intercept).

It's not much fun. Frankly, working on cpyext, debugging the crashes, and everything else that goes with it is not terribly fun, especially when you know that the end result will be slow. We've demonstrated we can build a much faster NumPy, in a way that's more fun, and given that the people working on this are volunteers, it's important to keep us motivated.

Finally, we are not proposing to rewrite the entirety of NumPy or, god forbid, BLAST, or any of the low level stuff that operates on C-level arrays, only the parts that interface with Python code directly.

C bindings vs. CPython C-API

There are two issues on C code, one has a very nice story, and the other not so much. First is the case of arbitrary C-code that isn't Python related, things like libsqlite, libbz2, or any random C shared library on your system. PyPy will quite happily call into these, and bindings can be developed either at the RPython level (using rffi) or in pure Python, using ctypes. Writing bindings with ctypes has the advantage that they can run on every alternative Python implementation, such as Jython and IronPython. Moreover, once we merge the jittypes2 branch ctypes calls will even be smoking fast.

On the other hand there is the CPython C-extension API. This is a very specific API which CPython exposes, and PyPy tries to emulate. It will never be fast, because there is far too much overhead in all the emulation that needs to be done.

One of the reasons people write C extensions is for speed. Often, with PyPy you can just forget about C, write everything in pure python and let the JIT to do its magic.

In case the PyPy JIT alone isn't fast enough, or you just want to use existing C code then it might make sense to split your C-extension into 2 parts, one which doesn't touch the CPython C-API and thus can be loaded with ctypes and called from PyPy, and another which does the interfacing with Python for CPython (where it will be faster).

There are also libraries written in C to interface with existing C codebases, but for whom performance is not the largest goal, for these the right solution is to try using CPyExt, and if it works that's great, but if it fails the solution will be to rewrite using ctypes, where it will work on all Python VMs, not just CPython.

And finally there are rare cases where rewriting in RPython makes more sense, NumPy is one of the few examples of these because we need to be able to give the JIT hints on how to appropriately vectorize all of the operations on an array. In general writing in RPython is not necessary for almost any libraries, NumPy is something of a special case because it is so ubiquitous that every ounce of speed is valuable, and makes the way people use it leads to code structure where the JIT benefits enormously from extra hints and the ability to manipulate memory directly, which is not possible from Python.

Progress

On a more positive note, after we published the last post, several new people came and contributed improvements to the numpy-exp branch. We would like to thank all of them:

nightless_night contributed: An implementation of __len__, fixed bounds checks on __getitem__ and __setitem__.

brentp contributed: Subtraction and division on NumPy arrays.

MostAwesomeDude contributed: Multiplication on NumPy arrays.

hodgestar contributed: Binary operations between floats and NumPy arrays.

Those last two were technically an outstanding branch we finally merged, but hopefully you get the picture. In addition there was some exciting work done by regular PyPy contributors. I hope it's clear that there's a place to jump in for people with any level of PyPy familiarity. If you're interested in contributing please stop by #pypy on irc.freenode.net, the pypy-dev mailing list, or send us pull requests on bitbucket.

Alex

Anonymous wrote on 2011-05-05 23:14:

How does this suggestion to use ctypes to interface with external C modules square with the python-dev antipathy towards doing that?

"Given the choice of using either ctypes or an external package, I prefer the external package." Martin v. Löwis

"If it means using ctypes to interface with system C libraries, I'm -10 on it :)" Antoine Pitrou

Alex wrote on 2011-05-05 23:19:

I don't know what to say for them, besides they apparently don't hate it so much as to remove it from the stdlib :)

Michael Foord wrote on 2011-05-06 00:08:

Isn't there another fairly major drawback to implementing in RPython - that you can only use it if it is compiled (translated) at the same time as pypy. So effectively pypy *has* to be distributed with all the RPython extensions you will ever use, or you have to retranslate *everything* whenever you add a new extension.

Developing cross-platform, cross-architecture, stuff with ctypes can also be a lot more painful than writing extensions using the Python C API (and having the compiler make some decisions at compile time rather than having to do it all at runtime).

Robert Kern wrote on 2011-05-06 04:54:

Most of python-dev's "antipathy" towards using ctypes is focused on using ctypes for stdlib modules, not on general principles. For security, stability, and portability reasons, many platforms need to disable ctypes when they build Python. Consequently, there is a policy that no stdlib module can use ctypes. They are not recommending against using ctypes in general.

Anonymous wrote on 2011-05-06 05:19:

One major worry is how well you will end up tracking NumPy development. Will you evenutally add an "RPython glue" subdir to NumPy's distribution?

Anonymous wrote on 2011-05-06 05:59:

thanks for the follow-up. I won't argue with points 1 and 3, but I think 2 can be reasonably addressed: I don't think the usage of internal details is pervasive in the code, and most of it is for historical reasons. We cannot remove them altogether from the numpy headers for backward compatibility reasons, but we can replace most of it inside numpy itself.

I am still a bit confused though: from your description, it seems that you intend to fork numpy to replace some pieces from C to RPython, but if I look at the numpy-ext branch, I see a rewrite of numpy in rpython. Maybe you are talking about another code ?

Anonymous wrote on 2011-05-06 08:22:

I think that the most important part of numpy is array operations (indexing, +-*/, broadcasting, etc). So it would be good enough to implement only array class in RPython and call to numpy using ctypes/cpyext for all other stuff. I've read somewhere about the plans to impose separation between numpy and scipy so numpy holds only implementation of fast arrays and scipy will hold all non-trivial operations on them. IMHO such separation will be ideal for pypy too.

Wladimir wrote on 2011-05-06 08:42:

Thanks for the clear explanation. I really wondered why it was so hard to re-use the existing numpy.

Antoine P. wrote on 2011-05-06 15:02:

Thanks Robert for clarifying our position :)

Another issue with ctypes is that it doesn't work on all systems.

Yet another issue with ctypes is that it is currently unmaintained (which won't help fixing portability issues :-)).

Anonymous wrote on 2011-05-06 17:26:

I am sory for the silly question, but how do I install this module in an existing pypy instalation ?

Thanks for the great job !

Anonymous wrote on 2011-05-06 21:15:

OK I see ...

hg clone https://foss.heptapod.net/pypy/pypy/-/tree/branch/numpy-exp .....

Anonymous wrote on 2011-05-07 03:49:

I like the idea of reimplementing part of Numpy in pypy to leverage the JIT in pypy. The existence of numexpr demonstrates the deficiency of Numpy as a Python library. A JIT is much more appropriate for what effectively should be a DSL.

But I would recommend something grander, perhaps for the longer term. I think if pypy could produce do for Python what McVM and McJIT propose to do for Matlab, it would be game-changing for Python and pypy. It would make pypy not only competitive with Matlab in ways that Numpy and Scipy are not yet and may never be, but also with F#. The rapid uptake of F# in financial industry in particular, despite the availability of Matlab, showcases the need for a fast prototyping language that does not rely on calling Fortran code for speed. I know I am looking for such language; Numpy and Python simply don't offer enough power and flexibility. I hope I can choose pypy.

Anonymous wrote on 2011-05-11 00:31:

Any idea about an eta on merging the jitypes2 branch (and/or a little more info on what it does to speed ctypes up so much)?

Antonio Cuni wrote on 2011-05-11 07:33:

@anonymous: the jitypes2 branch is mostly ready, but we are hunting two bugs and won't be merged until we fix them.

The speedup comes from the fact that ctypes call are seen by the JIT, and directly compiled into a call to the corresponding C function. Thus, most of the overhead of ctypes itself is optimized away.

Unknown wrote on 2011-05-11 19:51:

I wonder if an RPython/cython backend might be possible. cython is already my favorite way to write CExtensions and it generates code for both python 2.x and 3.x. It would be great if it could be adapted for PyPy extensions.

Anonymous wrote on 2011-05-12 18:51:

Hi!

Thanks a lot for the previous post and the follow up! I really appreciate that you could find time to make a write up on the progress that you made so far on this extremely important feature.

This all sounds very cool, but also to me it seems that it's very important to work with NumPy / SciPy developers, so that the parts that have to be replaced would be isolated and maintained in parallel for RPython and C API, or rewritten in ctypes (not sure if this is even possible). This way this eternal catch-up trap that many seem to be afraid of will not happen.

Also, I wonder in how much money this would actually translate. Maybe Enthought could sponsor some development...

Regarding Cython... I also use it to write trivial extensions to implement computation kernels outside Python in C. It would be great if Cython were able to generate something that would work with PyPy as well...

Thanks!

Laura Creighton wrote on 2011-05-13 17:55:

CLM:We actually have a GSoC student proposal from Romain Guillebert to
investigate this idea.

Maciej Fijalkowski wrote on 2011-05-23 08:55:

@Anonymous the idea is that you should not use Cython at all and PyPy's JIT should handle the computational kernel just fine.

Anonymous wrote on 2011-07-26 11:18:

I don't know why do you decide to use ctypes - in numpy community it is considered as obsolete already for a long time (maybe several years), is not under active development, and now Cython is recommended by default tool for it:

https://docs.scipy.org/doc/numpy/user/misc.html?highlight=cython#interfacing-to-c

Also, I guess you could search for some volunteers to work on numpy-PYPY in numpy-user, scipy-user, scipy-dev mail lists.

I'm interested in operations like hstack, vstack, max, min, argmin, nanmax, nanargmin (along a given axis) etc - are they already available? Or when it will be done?

Numpy in PyPy - status and roadmap

Maciej Fijalkowski

2011-05-04 17:04

41 comments

Hello.

NumPy integration is one of the single most requested features for PyPy. This post tries to describe where we are, what we plan (or what we don't plan), and how you can help.

Short version for the impatient: we are doing experiments, which show that PyPy+numpy can be faster and better than CPython+numpy. We have a plan on how to move forward, but at the moment there is lack of dedicated people or money to tackle it.

The slightly longer version

Integrating numpy in PyPy has been my pet project on an on-and-off (mostly off) basis over the past two years. There were some experiments, then a long pause, and then some more experiments which are documented below.

The general idea is not to use the existing CPython module, but to reimplement numpy in RPython (i.e. the language PyPy is implemented in), thus letting our JIT achieve extra speedups. The really cool thing about this part is that numpy will automatically benefit of any general JIT improvements, without any need of extra tweaking.

At the moment, there is branch called numpy-exp which contains a translatable version of a very minimal version of numpy in the module called micronumpy. Example benchmarks show the following:

	add	iterate
CPython 2.6.5 with numpy 1.3.0	0.260s (1x)	4.2 (1x)
PyPy numpy-exp @ 3a9d77b789e1	0.120s (2.2x)	0.087 (48x)

The add benchmark spends most of the time inside the + operator on arrays (doing a + a + a + a + a), , which in CPython is implemented in C. As you can see from the table above, the PyPy version is already ~2 times faster. (Although numexpr is still faster than PyPy, but we're working on it).

The exact way array addition is implemented is worth another blog post, but in short it lazily evaluates the expression and computes it at the end, avoiding intermediate results. This approach scales much better than numexpr and can lead to speeding up all the operations that you can perform on matrices.

The next obvious step to get even more speedups would be to extend the JIT to use SSE operations on x86 CPUs, which should speed it up by about additional 2x, as well as using multiple threads to do operations.

iterate is also interesting, but for entirely different reasons. On CPython it spends most of the time inside a Python loop; the PyPy version is ~48 times faster, because the JIT can optimize across the python/numpy boundary, showing the potential of this approach, users are not grossly penalized for writing their loops in Python.

The drawback of this approach is that we need to reimplement numpy in RPython, which takes time. A very rough estimate is that it would be possible to implement an useful subset of it (for some definition of useful) in a period of time comprised between one and three man-months.

It also seems that the result will be faster for most cases and the same speed as original numpy for other cases. The only problem is finding the dedicated persons willing to spend quite some time on this and however, I am willing to both mentor such a person and encourage him or her.

The good starting point for helping would be to look at what's already implemented in micronumpy modules and try extending it. Adding a - operator or adding integers would be an interesting start. Drop by on #pypy on irc.freenode.net or get in contact with developers via some other channel (such as the pypy-dev mailing list) if you want to help.

Another option would be to sponsor NumPy development. In case you're interested, please get in touch with us or leave your email in comments.

Cheers,
fijal

Unknown wrote on 2011-05-04 17:30:

While the RPython approach does sound valuable long-term, do you know if anyone has experimented with cpyext and the CPython extension module as a near-term alternative?

matt harrison wrote on 2011-05-04 17:30:

Great post. (I'm another person who would like numpy on pypy).
What are the guidelines for when something should be implemented in RPython? For me personally there are a few instances I would trade some of the dynamicism of Python for speed in my own code.

Maciej Fijalkowski wrote on 2011-05-04 17:35:

@Nick the mixed approach (use cpyext and pieces in RPython) sounds maybe valuable short term, but it can burn people easily. RPython-only is way more elegant and gives you wins upfront. Since there is noone willing to invest time in short term approach, this sounds like a no-brainer.

@matt almost nothing should be implemented in RPython, except the interpreter itself. Writing Python should be fast enough. Numpy is a notable example where we want to leverage last bits and pieces of JIT and be really really fast. For example you can't really leverage SSE from Python layer.

Davide wrote on 2011-05-04 18:12:

Are you in touch with Numpy developers? Are they eager to "stop" using Python and move to RPython? I mean, if this work needs to be redone for each version of Numpy, we will be always lagging behind, and always spend lot of efforts. On the other hand, if Numpy devs will start using the RPython for and let die the pure-Python one, then, the porting effort would me much more meaningful, and I believe it will be easier to find a group of people interested in doing it (myself, maybe)

Davide wrote on 2011-05-04 18:13:

And what about SciPy?

Anonymous wrote on 2011-05-04 18:15:

I've got to say that this worries me more than it encourages me.

1) It doesn't sound like this path will lead to easier integration of scipy. If I'm wrong please let me know! But if I'm right, the reality is that most of the reason I care about numpy is because scipy depends on it, and I care about scipy.

2) What about the numpy refactoring effort, which is supposed to be making a better C interface for numpy, which works with IronPython as well as CPython (https://lists.ironpython.com/pipermail/users-ironpython.com/2010-December/014059.html)? Why not just encourage that effort, and leverage it for PyPy integration? Is there a reason it won't work for numpy even though it works for both IronPython and CPython? (

Maciej Fijalkowski wrote on 2011-05-04 18:27:

@Davide it's not Python vs RPython, it's C (which numpy is implemented in) vs RPython. No numpy users will be requires to use RPython for anything.

@Gary I believe you're wrong. The idea stays the same - you can call arbitrary C code that will manipulate raw memory and do what it wants to do. The idea is to implement only the interface part (which uses CPython C API) and not the C part, which will work anyway. So at the end, we hope to leverage that effort. Also we're not microsoft and we can't pay large sums of money to do it and having small subset of numpy that's really fast appeals much more to me than a large effort that only gives numpy for pypy (that's not faster than cpython's one).

Davide wrote on 2011-05-04 19:12:

@Maciej: It was clear to me that numpy users shouldn't change anything, but I thought you intended to change only the Python part of Numpy, not the C part.

Now, if you plan to change the whole C sections, that's a huge job. What are your plans for dependencies like the BLAS, LAPACK and the likes? Would you reimplement them in RPython as well?

And regardless of the answer, my question is still valid: do you see this project as a "catch-up porting" of Numpy, with the version for CPython going on by itself? Or do you see the RPython fork becoming the mainstream Numpy? And if it's the latter, how that would perform on CPython? I think these questions are the key of the matter.

Maciej Fijalkowski wrote on 2011-05-04 19:18:

see my reply above about BLAS/LAPACK etc. Regarding the C part, it's a big task, but I think not too big. Also it's relatively easy to come up with working piece that's not full, nontheless useful.

This won't work on CPython, period.

Anonymous wrote on 2011-05-04 19:25:

@Maciej -- sorry if I'm being dense, but are you saying that the approach you're outlining will allow for scipy to work with numpy?

Maciej Fijalkowski wrote on 2011-05-04 19:29:

@Gary an uneducated guess would be "yes". Probably with some work and scipy refactoring.

cool-RR wrote on 2011-05-04 19:34:

Thanks for writing this post Maciej! It's great to have some visibility on your plans about this issue.

Anonymous wrote on 2011-05-04 19:47:

OK. As I've argued before in various pypy groups, I think one of the groups that will most strongly benefit from pypy's speed is the scientific community -- but they need numpy and scipy. So now that I know that this plan will (hopefully) allow for both of those to be used from pypy, I'm encouraged by it.

Anonymous wrote on 2011-05-04 19:49:

@Maciej: The parts of Scipy written in Python are for the most part not large. The main work would be in reimplementing the C code that uses Numpy's C-API, and figuring out a way to interface with Fortran code.

Joseph wrote on 2011-05-04 20:21:

You say you lack sufficient resources to put in a large effort, but your answers to CPython extensions is "reimplement everything RPython". Would it not make more sense to improve cpyext so that you get good performance out of it (maybe even JIT compatible)? This seems like a better answer then re-writing every single CPython extension and trying to keep the RPython implementation in sync.

Peter Cock wrote on 2011-05-04 20:33:

Have you tried micronumpy under Jython? I'm assuming RPython, being just a subset of Python, should also work there, and might double as a way to get (some of) NumPy on Jython.

Maciej Fijalkowski wrote on 2011-05-04 20:34:

@Joseph cpyext will always be only a semi-permanent compatibility layer. Making numpy work with cpyext is both unrewarding (hard work with obscure bugs), but also significantly harder to make fast, in some places completely impossible. Yes, it doesn't make sense for all extensions, it doesn't even make sense for most. Numpy is however special, since speed is the reason of it's existence. Also, frankly, when it comes down to my free time "let's make this cool JITed code run 50x faster than CPython" beats "let's stare puzzled at this segfault".

Maciej Fijalkowski wrote on 2011-05-04 20:35:

@Joseph anyway, it's exactly for the same reason "why write another interpreter if you can just improve CPython". Because it's easier at the end.

Corbin Simpson wrote on 2011-05-04 21:45:

To everybody asking why we cannot just use cpyext: I already tried it. It's not gonna happen without hacking the crap out of numpy. Additionally, it's going to be slow: Numpy is not fast for most operations, because of double-unboxing. Only vector ops are fast. JITing the operations is going to be a big win.

For those of you not believing numpy is slow, look at numexpr (https://code.google.com/p/numexpr/) which implements many of the same ideas that we are planning on implementing.

Jonas B. wrote on 2011-05-04 21:45:

Extremely exciting! Perhaps this is a good time to document the internals of NumPy a bit better while your scour the source to reimplement in RPython.

Perhaps this is a good fit for a Kickstarter (or similar) project? I believe this requires very talented and dedicated developers and paying the professionally by raising money on the Internet should be possible. It's been done before.

Anonymous wrote on 2011-05-04 22:58:

Yes, having a couple of Kickstarter projects for PyPy would be nice. It seems the current view is "we'll wait for someone wanting a feature enough to fund it". Picking one or two known valuable features to put on Kickstarter would provide for a nice test: can you raise more money by asking for it in a targeted way?

Anonymous wrote on 2011-05-05 01:23:

Two comments:

One, you guys need to make up your minds with respect to how people are supposed to interface C code with PyPy, and make one well-supported way. The sooner, the better.

Two, as long as your numpy clone implements the (new-style) Python array interface, it should "just work" with Scipy, with everything else being a Scipy bug. (Correct me if I'm wrong.)

Andreas

Anonymous wrote on 2011-05-05 01:58:

Doesn't getting SciPy to work involve interfacing with a lot of Fortran code?

Unknown wrote on 2011-05-05 04:47:

To address some of the criticism you're receiving, it may be worth making another post clarifying the points made in the comments and elsewhere:

- numpy+cpyext has been tried and found wanting (and very hard to debug)
- no developers available that are interested in beating their heads against that particular wall
- pure C and Python components of numpy should remain largely the same
- only the Python bindings layer that uses the CPython C API needs to be reimplemented
- RPython has its own FFI which is PyPy's preferred way to interface to non-Python code (https://pypy.readthedocs.org/en/latest/rffi.html)
- cpyext is a useful tool for compatibility with relatively simple C extensions that don't stress the C API greatly, but numpy is not such an extension.

david wrote on 2011-05-05 09:42:

Hi maciej, I am david (we quickly met at pycon where I presented myself as a numpy guy).

I think part of the misunderstanding is around the meaning of "numpy in pypy". Rewriting an array class on top of pypy is certainly valuable, and I am in no position to tell other people what to do in their free time. But I don't think it can realistically mean people will be able to use this instead of numpy after 2-3 man months: how will interfacing with BLAS/LAPACK work ? How will interfacing with the vast amount of fortran code in scipy work ?

If cpyext is indeed a dead-end, it would valuable to know why. Personally, I would certainly be happy to fix parts of numpy that makes cpyext impractically, even if it meant it were twice slower than on cpython. Because I could still benefit from pypy *elsewhere*, without having to rewrite all the numpy/scipy/etc... code.

Maciej Fijalkowski wrote on 2011-05-05 09:53:

@david please look above at my responses. there will still be a piece of memory you can pass to LAPACK or BLAS or something. the RPython part is about the interface only and not C-only part. If you want to improve numpy, please separate C-only parts from interface parts as much as possible, using C from RPython is a no-brainer.

Dániel Varga wrote on 2011-05-05 10:34:

Maciej, let me second Nick's polite request for a more detailed post about the plan.

If even David, an actual numpy developer can misunderstand your description, what do you expect from the unwashed masses of scipy users like me? :) Fortunately it does not take too much effort to alleviate the worries. All you have to do is explain to everyone that the plan takes into account the giant amount of C and Fortran code in numpy/scipy, and takes into account the fact that forking numpy/scipy is infeasible.

Bluebird wrote on 2011-05-05 11:49:

Didn't you say in another post that the JIT is more efficient at optimizing Python code than RPython ?

cournape wrote on 2011-05-05 12:17:

@daniel: I don't think there is a misunderstanding as much as a different people wanting different things. I believe that Maciej and other pypy people are more interested in leveraging pypy and its JIT do to things which are indeed quite complicated in numpy today (avoid temporary, fast iterators in python, etc...). I have little doubt that pypy is a better platform than cpython to experiment this kind of things.

I am more surprised about the claim that numpy is so tight to cpyhon internals. It certainly depends on the C API, but mostly public API, documented as such.

Armin Rigo wrote on 2011-05-05 12:45:

@nick: thank you very much for giving all relevant pieces of information that are missing from the original post!

glyph wrote on 2011-05-05 19:32:

Hey Maciej! This sounds absolutely awesome. I hope you can find someone to do the necessary work. I think you might need to explain a little better in a separate post where that 48x speedup comes from, and why RPython is a necessary part of it. I think I understand why, but clearly some of the commenters don't :).

Anonymous wrote on 2011-06-21 22:50:

Well, if the answer of "How to make numpy available in pypy" is "do a complicated rewrite of numpy," then I'm pretty skeptical about the pypy project. I primarily use numpy, but also scipy sometimes and Image sometimes. As a user it's most important to me that code runs. Speed is not as critical. For example if I take stddev() of an array I first want that to run, and only secondarily want it efficient. If there's a library that I might want to use, and I can't expend a reasonable amount of effort to wrap it, or else someone else can do that, then I don't find pypy that encouraging at all. Since there are lots of libraries out there, and it has been convincingly argued that Python's primary utility is its library support.

Alex wrote on 2011-06-21 22:56:

@Anonymous: While you may not be concerned with performance, a great many people are. The only way to have arbitrary numpy stuff work in theory would be CPyExt, but as we've said that's frought with complications in that a) it won't work out of the box on something that uses as many corners of the CPython C-API as NumPy, and b) will always be slow. Given people's desire for speed with respect to NumPy we consider reimplementing it a reasonable course.

Anonymous wrote on 2011-06-22 00:33:

Alex -- I'm not saying speed is unimportant. What I'm saying is being able to easily make existing CPython extension modules compile against numpy is very important to people. If there is a 20% slowdown or a 10% speedup of the C extension in many cases that is no big deal. Most importantly it would put PyPy on rather equal standing with CPython. And then the JIT pure Python code might win out for efficiency, so PyPy might be a net win for many users.

On the other hand doing research into lazy evaluation and vectorizing and loop restructuring, can obviously make numpy faster, but is more of a tangent, than being helpful to the majority of users who just want to run their CPython extensions at roughly the same speed under PyPy. Until people can actually run their extensions easily (which I argue is the major value that Python has) I doubt there will be much adoption of PyPy.

Say I can already add lists of floats and take their standard deviation using numpy, using the C extension library. It isn't clear to me why this should be substantially less efficient under PyPy than under CPython.

We see the same issue with Python 3.0 adoption. Personally I think it makes bad language changes such as getting rid of string % operator which I use constantly, so I'd avoid it for that reason. But far more importantly it can't run a lot of the libraries I use, with comparable performance. So it's completely a no go to me for that reason.

So I am suggesting that optimizing a single library by rewriting it, seems a case of premature optimization when most libraries can't even run with PyPy.

Maciej Fijalkowski wrote on 2011-06-22 07:36:

It's a tough call, but for me most libraries run under PyPy. There are few that don't but I can usually work around that. Regarding numpy - noone wants slower numpy *no matter what*. Besides, it's not clear whether making numpy behave using CPyext would take less effort than writing it from scratch - the first reasonable subset can be expected *much* faster, when doing a rewrite.

Numpy really *is* special, for all my needs, I want a small subset that performs reasonably well, not a whole thing that performs poorly. It's a matter of taste, but it's also much more fun, which plays a lot in terms of people spending free time on it. Would you rather add functionality for X that you need or fix next obscure segfault?

Cheers,
fijal

Maciej Fijalkowski wrote on 2011-06-22 10:14:

@Anonymous Clarifying: We're hoping to reuse most parts of numpy (and scipy), especially those written in pure C. The "only" part requiring rewriting is the part that uses CPython C API, which is mostly the array interface.

Anonymous wrote on 2011-06-23 04:23:

Maciej -- I didn't realize large parts of these libraries could be reused. So maybe once the PyPy C extension facilities are working well enough that important 3rd party libraries can be compiled, I'll just switch to PyPy for performance. It sure does sound more fun to make numpy functions compile down to heavily optimized RPython and get big speed gains. But I still maintain that users would appreciate being able to get all arbitrary libraries to build in the first place, e.g. if scipy or library X depends on the numpy C interface, and that gets broken in the PyPy numpy implementation, then users won't be able to use their desired library at all. So I guess I'm just arguing that the most C extension modules that can work with numpy, the better. Since if we wanted fast but no libraries we'd be using C :-).

Davide wrote on 2011-06-23 16:19:

Maciej (et all),
it looks like this issue isn't clear yet to people. Let's see if I can help.

Numpy is made of 3 "piece" (doesn't matter if they are separate pieces or mingled together, they are there): a pure python part, a pure C part and a C-to-python "glue". All of them are very important to numpy, but the C-to-python glue is special, in that both python and C need to access the same data structures without any conversion or copy (otherwise it will be slow). I'm not sure what exactly numpy is doing for this "special glue" part, but that's the point where pypy suffer: of course pypy works just fine with pure python, and doesn't "care" at all about the C sections. So one option is to rewrite the C-to-python pieces of numpy. I'm sorry but it's still unclear to me if you want also to rewrite the C part or not (here you said kind-of-yes: https://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html?showComment=1304533136864#c3499269873134208179 and here you said no: https://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html?showComment=1308734098907#c2151221303214453177 so probably you should clarify better)

Now, if I understand it right, your plan is to fork numpy for this purpose (either rewrite the C-to-python glue only, or the C part also). I believe this will fail, and the reason is pretty simple: first, even before you start, you already say that you don't have people/money/time to commit to this project. Second, maintaining a fork is a huge, huge task. You might easily introduce bugs, break feature, etc - while people are expecting something that "just works" as drop-in replacement, so even a "almost success" from a technical point of view, can be a big failure for adopter, if it doesn't behave. Last, but not least, numpy is a moving target, and you'll always play catch up. Is this the game you want to play??

Now, I don't want to tell you what you have to do for fun, but if you want to have chances of success, you have to change the "politics" of your plan. I trust you that technically your plan is fine, but rather than implementing it within a numpy fork (or worst: rewrite), I suggest that you work with the numpy and/or CPython community, to see if you can write a wrapper around cpyext (or whatever they are using for C-to-Python glue). This wrapper (at compiler time) should either become cpyext (or whatever) if you are using CPython, or become "something else" if you are using pypy. If you persuade numpy people to use this wrapper you'll have the same numpy code base working as is in CPython and pypy. Sure you will not be exploiting the faster-than-C capabilities of pypy, but you can get there more smoothly: improving the speed one feature at time, while the rest of the framework is still working and thus useful, and thus increasing its user base, people interested in it (and some of them may become contributors).

Instead your plan sounds like: implement one feature at time, while the rest of the framework doesn't work and thus nobody uses it in production, let alone care about its speed. On top of which, you'll be trying to catch-up with numpy.

Maciej Fijalkowski wrote on 2011-06-23 18:12:

@Anonymous there are many things I disagree with and I'm not going to fork numpy.

The basis is - I claim there is more use for fast numpy which is incomplete than slow complete numpy.

I would refer you to yet another blog post (personal this time) explaining more why I do what I do: https://lostinjit.blogspot.com

Connelly Barnes wrote on 2011-08-24 04:13:

Here is a completely different approach taken by IronPython for Scipy+Numpy compatibility:

https://www.johndcook.com/blog/2009/03/19/ironclad-ironpytho/

It's basically a bidirectional FFI. Have a CPython and an IronPython both running, and wrap objects so that IronPython objects can be used by CPython and vice versa. This requires some platform specific binary level compatibility, in their case, DLL hacking, to allow the FFI to work in both directions.

It seems like that approach should be practical for getting all of large libraries such as Scipy or Numpy working in Pypy. Since it's already been demonstrated to work for IronPython.

The above roadmap proposes instead speeding up the core array object by coding it in RPython.

But I wonder if these two approaches could work together. For example Numpy could be configured to use ordinary CPython array objects, or PyPy compiled RPython array objects. Then the FFI just has to take care to wrap objects appropriately that are in the "other interpreter".

Thoughts?

Connelly Barnes wrote on 2011-10-13 00:30:

As a follow up to my previous comment, I noticed there is a bidirectional FFI for Python called RPyC that was previously discussed on the Pypy blog:

https://morepypy.blogspot.com/2009/11/using-cpython-extension-modules-with.html

I have no idea if it has been tried with Numpy yet.

PyPy 1.5 Released: Catching Up

Carl Friedrich Bolz-Tereick

2011-04-30 15:59

20 comments

We're pleased to announce the 1.5 release of PyPy. This release updates PyPy with the features of CPython 2.7.1, including the standard library. Thus all the features of CPython 2.6 and CPython 2.7 are now supported. It also contains additional performance improvements. You can download it here:

https://pypy.org/download.html

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.1. It's fast (pypy 1.5 and cpython 2.6.2 performance comparison) due to its integrated tracing JIT compiler.

This release includes the features of CPython 2.6 and 2.7. It also includes a large number of small improvements to the tracing JIT compiler. It supports Intel machines running Linux 32/64 or Mac OS X. Windows is beta (it roughly works but a lot of small issues have not been fixed so far). Windows 64 is not yet supported.

Numerous speed achievements are described on our blog. Normalized speed charts comparing pypy 1.5 and pypy 1.4 as well as pypy 1.5 and cpython 2.6.2 are available on our benchmark website. The speed improvement over 1.4 seems to be around 25% on average.

More highlights

The largest change in PyPy's tracing JIT is adding support for loop invariant code motion, which was mostly done by Håkan Ardö. This feature improves the performance of tight loops doing numerical calculations.
The CPython extension module API has been improved and now supports many more extensions. For information on which one are supported, please refer to our compatibility wiki.
These changes make it possible to support Tkinter and IDLE.
The cProfile profiler is now working with the JIT. However, it skews the performance in unstudied ways. Therefore it is not yet usable to analyze subtle performance problems (the same is true for CPython of course).
There is an external fork which includes an RPython version of the postgresql. However, there are no prebuilt binaries for this.
Our developer documentation was moved to Sphinx and cleaned up.
and many small things :-)

Cheers,

Carl Friedrich Bolz, Laura Creighton, Antonio Cuni, Maciej Fijalkowski, Amaury Forgeot d'Arc, Alex Gaynor, Armin Rigo and the PyPy team

kost BebiX wrote on 2011-04-30 16:59:

Cool. Blog design became blue :-)

Anonymous wrote on 2011-04-30 17:37:

Unless there is something Intel specific - maybe calling it x86/x86-64 might be a good idea since this suggests that pypy does not work on amd / via chips.

Anonymous wrote on 2011-04-30 21:33:

do you have plans to add CPython 2.7.1 to speed.pypy.org?

Anonymous wrote on 2011-04-30 22:21:

Is it just me or does cProfile seem rather broken (at least on Windows)? I get random subtimings that are negative or in the billions.

>>>> cProfile.run("[abs(1) for n in xrange(10**6)]")
1000002 function calls in 1.000 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 -137.813 -137.813 1.000 1.000 :1()
1000000 138.813 0.000 138.813 0.000 {abs}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Prof
iler' objects}

Zooko wrote on 2011-04-30 22:34:

Where's the flattr button? I want to give you a euro tip again, just like I do every time you blog.

Also: way to go on releasing PyPy 1.5! This project is really growing up!

Armin Rigo wrote on 2011-05-01 11:10:

Anonymous: cProfile on Windows works for me. It might be details of your Windows version or whatever. Can you open it as a proper bug report? Thanks! https://codespeak.net/issue/pypy-dev/

Unknown wrote on 2011-05-01 11:24:

Awesome! Looking forward to PyPy on NaCl.

Antonio Cuni wrote on 2011-05-01 12:20:

@zooko: I don't know why the flattr button went away. I re-uploaded the template to blogger and now it seems to be there again, can you confirm?

etal wrote on 2011-05-01 13:40:

Great stuff. Do you think PyPy is ready to be re-packaged for Debian yet?

I'm looking at this:
https://bugs.debian.org/538858

I have a feeling the popcon would be quite a bit higher nowadays.

Gaëtan de Menten wrote on 2011-05-02 08:19:

Congratulations to the whole team. What's coming next now that this large milestone is completed?

Anonymous wrote on 2011-05-02 11:17:

Is it just me or does the download page still point to the 1.4.1 release?

Antonio Cuni wrote on 2011-05-02 11:23:

@Anonymous: what is the "download page" you are talking about? For me,
https://pypy.org/download.html

shows only links to PyPy 1.5. Maybe it's a browser cache issue?

Anonymous wrote on 2011-05-02 11:31:

This is insane.

I clicked on the link multiple times yesterday and today (after restarting firefox) and only now the page refreshed correctly.

Just shows you that anything can happen.

vak wrote on 2011-05-03 16:43:

btw, regarding https://bitbucket.org/pypy/compatibility/wiki/Home -- i am using pymongo driver under pypy without problems (not yet checked against the fresh pypy 1.5 though)

vak wrote on 2011-05-04 09:19:

minor thing -- version isn't updated?

Python 2.7.1 (b590cf6de419, Apr 30 2011, 02:00:34)
[PyPy 1.5.0-alpha0 with GCC 4.4.3] on linux2

Anonymous wrote on 2011-05-05 12:29:

Great news, 25% speedup over PyPy 1.4 is just another great step forward. I'm looking forward for times when Python will be fastest dynamic object-oriented language and it will be more and more popular. I feel that these times are very close thanks to PyPy.

What about adding PyPy to The Computer Language Benchmarks Game?

Damian Cugley wrote on 2011-05-07 10:36:

I have not yet managed to build C extensions on Mac OS X with distribute/distutils/whatever because sysconfig.get_config_var returns None. Is there a quick way to fix this?

Damian Cugley wrote on 2011-05-07 10:38:

@anonymous The Computer Language Benchmarks Game only permits one implementation per language, and CPython 3.2 is the implementation they use for Python.

Anonymous wrote on 2011-05-07 14:09:

Would it be easy to implement mutable builtin classes (for example for adding new methods to int or str) in pypy?

Thomas Heller wrote on 2011-06-07 17:38:

I'm speechless :-)

This is the first time I use pypy and it works out of the box even with my fancy Windows GUI toolkit (written completely in ctypes) out of the box.

Great work, guys!

Realtime image processing in Python

Global Interpreter Lock, or how to kill it

Report back from our survey

PyPy Genova-Pegli Post-EuroPython Sprint June 27 - July 2 2011

Goals and topics of the sprint

Exact times

Location & Accomodation

PyPy Usage Survey

Server migration in progress

Playing with Linear Programming on PyPy

NumPy Follow up

Reusing the original NumPy

C bindings vs. CPython C-API

Progress

Numpy in PyPy - status and roadmap

The slightly longer version

PyPy 1.5 Released: Catching Up

What is PyPy?

More highlights

The PyPy blogposts

Recent Posts

Archives

Goals and topics of the sprint

Exact times

Location & Accomodation

Reusing the original NumPy

C bindings vs. CPython C-API

Progress

The slightly longer version

What is PyPy?

More highlights

The PyPy blogposts

Recent Posts

Archives

Tags