As you know, a lot of PyPy's recent development effort has gone into speeding up execution of Python programs. However, an additional good property of PyPy's Python interpreter is that most objects are represented in a much more compact way than in CPython. We would like to investigate some more advanced techniques to reduce the memory usage of Python programs further.
To do this it is necessary to investigate the memory behaviour of real programs with large heaps. For speed measurements there are standard benchmarks, but for memory improvements there is nothing comparable, the memory behaviour of large programs is not that well understood. Therefore we are looking for programs that we can study and use as benchmarks.
Specifically we are looking for Python programs with the following properties:
- large heaps of about 10MB-1GB
- should have non-trivial runtime as well (in the range of a few seconds), to judge the speed impact of optimizations
- ideally pure-Python programs that don't use extension modules so that they run under both CPython and PyPy (this is optional, but makes my life much easier).
We are also rather interested in programs that do a lot of string/unicode processing.
We would be grateful for all ideas. Telling us about a program also has the advantage that we will work on optimizing PyPy for it :-).