Hacking on PyPy

In another great Pycon2012 keynote, David Beazely asked the question, “is PyPy easily hackable?” After a great talk, he answered with a decisive, “I still don’t know.” Having sprinted on Python I’d like to answer his question in a bit more detail.

I love David’s presentation style. He has a novel method of using phrases like, “blow your mind” and “this is really scary” repeatedly until they lose their meaning and you no longer feel mindblown or scared. A variety of factors, including Beazely’s thorough keynote address motivated me to join the PyPy team during the Pycon developer sprints.

I’d like to clear up one oversight in Dave’s otherwise impeachable talk. One of the PyPy devs, Holger Kregel explained to me that PyPy does not have over 1 million lines of code. I don’t have exact numbers, but for “historical reasons”, a non-python file containing Base64 encoded data was given a .py extension. When excluding this file from the line count, around half a million lines of actual Python code exist, and about a quarter of these are tests.

I was surprised how trivial it was to get started hacking on PyPy. I don’t really grok the many layers of the translation toolset and PyPy interpreter, but it’s pretty clear that the layers are well separated. I was hacking on the py3k branch of PyPy. I am happy to admit I was working primarily on changing print statements to print() functions and commas in exceptions to the as keyword.

Here are the steps to start hacking on PyPy. Notice that the hour-long translation step is not part of the procedure. PyPy has a solid test framework, and the PyPy crew are focused on a 100% test-driven-development paradigm.

  1. Clone pypy (this takes a while):
    hg clone https://bitbucket.org/pypy/pypy/
  2. Pick a branch to work on. There are about 80 branches. I don’t know what they all do. Popular ones during the sprints included py3k and numpy-ufuncs2
  3. Pick a feature to work on. For py3k support, the list of failing tests in the buildbot is a good place to start. Numpy programmers had a list of fuctions that needed implementing, but I can’t find the link. Ask on IRC, the PyPy crew are very helpful. The bug tracker contains many features and issues that need addressing
  4. Add /path/to/pypy/ to your path so you can run the pytest.py command
  5. cd into the directory indicated in the buildbot output and run pytest.py path/to/test.py -k testname
  6. The test will likely fail. Hack away and fix it.
  7. When the test passes, commit, push to a bitbucket repo, and issue a pull request.
  8. Repeat!
  9. There are quite a few cons to working on this project. If you run hg in the pypy/modules/ directory, it will try to pick standard library modules from pypy and choke horribly. The pypy developers don’t really believe in documenting their code. Being able to tell the difference between rpython and python (which have identical syntax) is important. In general, if a module starts with “interp_” it contains rpython, but if it starts with “app_” it contains python. The code does not appear to be well-documented.

    If you are hacking on Python 3 support, you need to bear in mind that the PyPy interpreter is written in Python 2. You are working on a Python 2 application that executes Python 3 bytecode!

    On the positive side, rPython and Python are much easier to read and write than C. The PyPy devs are brilliant, but not intimidating. They are so confident in their test suite that they are comfortable programming in a “cowboy coding” style, hacking randomly until all the tests pass. Any one layer in the toolchain is easy to understand and develop. The IRC channel is full of friendly, knowledgable, helpful people at any time of day.

    Overall, I am much less intimidated by this project than I was before I started the dev sprints. I still can’t answer the, “Is Python easily hackable?” question fully. It’s certainly easy to get started, but I don’t know how easy it is to become intimate with the project. Dave Beazely’s keynote made PyPy more approachable, and I approached it. Hopefully this article will encourage you to do the same.

One Comment

  1. fijal says:

    Hey.

    Maybe it’s worth pointing out that “running around and fixing tests” is only related to the py3k branch. Most other branches generally have either all tests passing and you write new ones before writing new code or have a few failing and you have to typically think first.

    Good post, thanks!

    Cheers,
    fijal