Skip to main content

PyPy and ijson - a guest blog post

This gem was posted in the ijson issue tracker after some discussion on #pypy, and Dav1dde kindly allowed us to repost it here:

"So, I was playing around with parsing huge JSON files (19GiB, testfile is ~520MiB) and wanted to try a sample code with PyPy, turns out, PyPy needed ~1:30-2:00 whereas CPython 2.7 needed ~13 seconds (the pure python implementation on both pythons was equivalent at ~8 minutes).

"Apparantly ctypes is really bad performance-wise, especially on PyPy. So I made a quick CFFI mockup:


CPython 2.7:
    python -m emfas.server size dumps/echoprint-dump-1.json
    11.89s user 0.36s system 98% cpu 12.390 total 

    python -m emfas.server size dumps/echoprint-dump-1.json
    117.19s user 2.36s system 99% cpu 1:59.95 total

After (CFFI):

CPython 2.7:
     python ../dumps/echoprint-dump-1.json
     8.63s user 0.28s system 99% cpu 8.945 total 

     python ../dumps/echoprint-dump-1.json
     4.04s user 0.34s system 99% cpu 4.392 total


Dav1dd goes into more detail in the issue itself, but we just want to emphasize a few significant points from this brief interchange:
  • His CFFI implementation is faster than the ctypes one even on CPython 2.7.
  • PyPy + CFFI is faster than CPython even when using C code to do the heavy parsing.
 The PyPy Team


Alendit wrote on 2015-06-18 08:38:

Maybe it's time to discuss inclusion of CFFI into stdandard library again?

Armin Rigo wrote on 2015-06-18 09:52:

If CPython decides to include it in its stdlib, I can make sure it is updated as needed. I don't have the energy to discuss its inclusion myself, so if it happens it will be "championed" by someone else. Nowadays, I personally think inclusion has as many drawbacks as advantages, even if CFFI 1.x shouldn't evolve a lot in the foreseeable future after the 1.0 step.

v3ss wrote on 2015-07-18 22:14:

The problem is converting existing libs to use cffi. Only very few percent of Libs are ready for python3.x and with this trend , not even 1% of libs will be converted to work with CFFI.
That makes PyPy adoption a lot slower.

Is there really no chance of improving ctypes?

Maciej Fijalkowski wrote on 2015-07-19 05:39:

you would think, but these days vast majority of popular C bindings come with cffi equivalents. In fact cffi is vastly more popular than ctypes ever was.