A (brief) introduction to Python-Core development | Completely Different

by jesse in , ,


This is a reprint of an article I wrote for Python Magazine as a Completely Different column that was published in the August 2008 issue.

In the early summer of this year I had the chance to really get started working on/with the core Python source. I had spent some time putting together a Python Enhancement Proposal (PEP) which was accepted. Now, I just needed to learn the code base, practices and buy a helmet. Shortly after getting the initial patch accepted, I ended up breaking the build, tests and caused the beta to slip. This article is an introduction to Core development, in which we'll cover what you need to get started, and where I personally screwed up.

Introduction

Core Python development (or, "hacking on python-core" as it may be called) is, like all great open-source projects, a highly distributed, highly active, and high participation project. There are developers all over the world filing bugs, submitting patches for code and documentation, as well as participating on the python-dev mailing list and IRC channel.

Like all other good open source communities, it's a meritocracy of the technical persuasion. A good idea is simply that: a good idea. If a good idea is the best of breed, it will be adopted or adapted to the language and project. If an idea or a patch is clear, concise, and solves a problem, there is generally no difficulty in getting traction or getting a patch put into core code base.

Let's start from the beginning

While Python is a meritocracy where any person can submit a patch, file a bug, or send emails to python-dev (sometimes, that last is more of a curse than a blessing), there is a particular group of people that has commit privileges. This group is responsible for judging all patches, proposed bugs and associated fixes, and ultimately committing the actual code to the tree.

Python's code, documentation, PEPs, and other artifacts are all hosted within a Subversion (svn) repository. While the core is in svn, you can also access it via other popular version control tools. There are Bazaar, Git, and Mercurial mirrors of the svn repository. All of the examples in this article will revolve around subversion, though, because the other trees are still experimental.

In order to view the repository, you need to check out a read-only version of the source tree. Write access is only available via svn+ssh authenticated access, but you can use HTTP for a read-only copy. So, to check it out:

mkdir -p python/trunk
svn co http://svn.python.org/projects/python/trunk python/trunk

This is your own, pristine copy: any edits you make in this tree will come up on a ''svn diff'' (which you'll use to make patches). Avoid editing files you don't need to so you don't accidentally taint a diff or checkin.

The basic layout of the tree is unsurprisingly simple, so I'll only really cover the important files/directories:

''Doc/'' contains all of the documentation for the language, which will be discussed in more detail later. If you want to see the standard library documentation, look in Doc/library.

You will find the brain-melting grammar definition for the Python language in ''Grammar/''.

Header files for C code go in ''Include/''.

Libraries written in Python are in ''Lib/''. You'll note a distinct lack of C code in this directory. That's because C modules go in the ''Modules'' directory. Also found in ''Lib/'' is the ''test/'' directory, which we'll be focusing on later. If you want to see some pretty Python code, read the files in this directory. Except anything I've done.

C extensions, such as multiprocessing, ctypes, cStringIO, et cetera can be found in ''Modules/''. Generally speaking, these are optimized modules for the standard library. Some of them are in subdirectories for cleanliness, but most of them are in the top level Modules/ directory. Note that there is a style guide for C code for the standard library, outlined in PEP 7.)

The ''Misc/'' directory contains things that don't belong elsewhere within the tree. This includes the NEWS file, build notes, configuration for valgrind (a code profiling/debugging utility), a cheat sheet (somewhat dated, but still useful), and some editor plugins. A really good file here is SpecialBuilds.txt, which goes over all the magic flags for Python builds you should know about.

Python objects are defined in ''Objects/''. It contains all C code, and is pretty well documented. If you suddenly get the urge to make a new type, start here.

Miscellaneous tools go in ''Tools/''. I haven't had to use much of anything down here except for the scripts in the ''scripts/'' subdirectory. The ''script'' directory is just filled with cool things like untabify.py, crlf.py, and google.py

There are two build files. The main build file, sort of, is ''setup.py''. I list it here because you need to look at this file to realize how things are built. The make steps we cover later are wrappers around this script for the most part. The the "other" build file is ''Makefile.pre.in''. It works with ''setup.py'' to control the entire compilation process and has some nifty targets, like "make tags". Who knew the build process could spit out a tags file for ''vi''?

It is important that you pay attention to both ''setup.py'' and ''Makefile.pre.in''. When I forgot one line in the Makefile, my extension module seemed to work, but didn't really. I could "import multiprocessing" from within the svn tree using the local python interpreter. However, after running "make install" the extension module was not installed, so it did not work with the installed interpreter. I finally discovered this was due to a single missing entry in LIBSUBDIRS.

Whew. That's a lot of directories. I skipped over the Windows build stuff, and I am going to continue to do so, noting that I am not a Windows expert. I do know that if you are on Windows you will need to look in the ''PCBuild/'' directory for build information, Visual Studio projects, etc.

Building

Before we go any further, let's walk through the basic build process. Remember, I'm a Linux and OS X guy, so I will be walking you through the steps you would take on a Unix machine. Windows users will need to either use Visual Studio, or install Cygwin (a Unix tool chain for Windows). Installing the Cygwin tool chain means you should be able to compile just fine following these directions.

First off, the ./configure step. If you're familiar with autoconf, automake, and the like, you're more than familiar with this. For those that aren't, the configure, make, etc. steps are common to configuring and compiling/installing a given application. See the link to Autoconf in the requirements section for more details. There are some custom options for configure (of course), which you can see with ''./configure --help''. The main one you want to know about and use is ''--with-pydebug'', which enables a special debug build of Python. You are going to want to have the debug build if you start heavily working on the core of the interpreter. The ''--with-pydebug'' flag enables, in no particular order, LLTRACE, Py_REF_DEBUG, Py_TRACE_REFS, PYMALLOC_DEBUG, C code assertions, and all code that has ''#ifdef Py_DEBUG'' blocks. In other words, it turns on just about every debugging feature you could possibly need or want, short of something that fixes your code for you automatically.

For the exact details on all of the configure flags, including platform specific options, see Misc/SpecialBuilds.txt.

To start a build, just fire off a

$ ./configure --with-pydebug

in ''python/trunk''. Once this is done, unless you really want to twiddle the options, you shouldn't need to do this again for a while. Brett Cannon once told me, when talking about some development TextMate macros, "I left out configure stuff because that becomes rather personal".

Next up, execute ''make'' in the python/trunk directory. You'll see your normal make output, but there are a few caveats to keep in mind.

Here is some example output from the ./configure and make steps:

$ ./configure
checking for --with-universal-archs... 32-bit
checking MACHDEP... darwin
checking EXTRAPLATDIR... $(PLATMACDIRS)
...snip...
creating Modules/Setup
creating Modules/Setup.local
creating Makefile
woot:python-trunk jesse$ make
... gcc output snipped ...
Failed to find the necessary bits to build
these modules:
_bsddb             gdbm               linuxaudiodev
ossaudiodev        readline           spwd
sunaudiodev
To find the necessary bits, look in setup.py in
detect_modules() for the module's name.

running build_scripts
$

Pay attention to the build output. If you're working on a module with C extensions or the interpreter itself, what can go wrong here will go wrong. For example, while working on integrating the _multiprocessing library to ''Modules/'', the initial issues around simple compilation were exposed here.

As you can see, there is an important report at the end of the make step (the log line looks like: "Failed to find the necessary bits to build these modules:"). The information given in that report is especially important if you need access to the skipped modules. For example, on OS X the ''readline'' module doesn't compile out of the box. You will need to resolve the dependencies listed in ''trunk/setup.py'' in order to get it up and running.

If you want to "quiet down" the make step, adding the "-s" flag will make it less verbose. Also, if you want to speed it up, consider using the "-j NUM" to increase the number of concurrent commands being performed.

Once the build completes successfully, you should have a working Python binary in your local directory. On OS X and Windows it's named ''python.exe'' and on Unixes it's named simply ''python''. If you wanted, you could fire this version up and poke around, but for development your next step should be to run the tests.

Running Tests

Python's source tree's tests are primarily executed with the ''Lib/test/regrtest.py'' utility (this may change in the future) and ''make test''. If you were to run ''make test'' in the ''trunk/'' directory right after building, you would run a subset of all of the tests located in ''Lib/test''. Certain tests, such as large file tests and others that take a lot of time or resources are excluded in favor of brevity.

For details on what a ''make test'' step does, open Makefile.pre.in and search for "# Test the interpreter" (it should be around line 660). You will find the definitions for what happens during the ''test*'' steps as well as the options that invoke ''regrtest.py''. You can change the test options via the ''TESTOPTS='' flag to ''make test''. For example, to run a single test:

$ make test TESTOPTS=test_multiprocessing

The real magic happens in regrtest.py, the Python regression test execution script). You need to run this for any change made to the code, period. A basic run is the same as the basic ''make test'' execution. This means that certain tests are excluded, but you can enable those tests (and a lot more) via additional arguments to regrtest.py. There is even an option to enable coverage analysis.

A basic invocation of regrtest.py looks like this:

$ ./python.exe Lib/test/regrtest.py
test_grammar
test_opcodes
test_dict
...snip...
test_zlib
327 tests OK.
32 tests skipped:
    test_al test_bsddb test_bsddb3 test_cd test_cl
    ...
    test_winsound test_zipfile64
Those skips are all expected on darwin.

Pretty painless, but if something goes wrong, there's not a lot of information to go on. A better way to run it is with the ''-w'' option, which will re-run any failed test with additional verbosity. For example, I added a line that would cause one of the tests to crash in Listing 1.

Listing 1:

$ ./python.exe Lib/test/regrtest.py test_multiprocessing
test_multiprocessing
test test_multiprocessing crashed -- : name 'mportasdl' is not defined
1 test failed:
    test_multiprocessing
$ ./python.exe Lib/test/regrtest.py -w test_multiprocessing
test_multiprocessing
test test_multiprocessing crashed -- : name 'mportasdl' is not defined
1 test failed:
    test_multiprocessing
Re-running failed tests in verbose mode
Re-running test 'test_multiprocessing' in verbose mode
test test_multiprocessing crashed -- : name 'mportasdl' is not defined
Traceback (most recent call last):
  File "Lib/test/regrtest.py", line 549, in runtest_inner
    the_package = __import__(abstest, globals(), locals(), [])
  File "/Users/jesse/open_source/subversion/python-trunk/Lib/test/test_multiprocessing.py", line 6, in 
    mportasdl;fj
NameError: name 'mportasdl' is not defined
$ 

There's one more important flag to regrtest.py you need to know about, and that's ''-uall''. This option will run all of the tests, and obviously, when you're changing something really low level, you need to run these tests. They take a long time, so I recommend running them before going to bed.

Documentation

Yes, even documentation has bugs. All of Python's documentation resides in the ''Doc/'' directory, and it has its own build scripts and system, called Sphinx. The standard library documentation module overviews we all know and love are located in ''Doc/library/''. When you are making a change that will be public in nature (say, adding a method) you need to find and update the associated documentation.

Also, when adding new packages, modules or methods, you should really consider adding an example in the appropriate section of the module's .rst file (not the ''Doc/examples'' directory). It is common for new Python users to have difficulty finding clear examples on standard library module usage, so the more examples the merrier.

If you're stuck with the documentation, feel free to send an email to docs@python.org and ask for help. There are a lot of good people signed up for that list and they're willing to help you if you're stuck.

The documentation is all in ReST (ReStructured Text) format and there is some Python-specific syntax that can be of use to you. See the "Documenting Python" page for more information. A nice nugget I found was breaking the bigger examples out of the main ''module.rst'' file (the documentation file for a give module, in ReStructure Text format), and include them separately with:

.. literalinclude:: ../includes/mp_webserver.py

This means you can drop the python code into the ''Doc/includes'' directory and it will be popped in place when the documentation is built.

When you want to try building the docs, simply go into ''trunk/Docs'' and type ''make html'' to convert all of the documentation into the HTML files you know so well from the Python doc site. Don't worry about installing Sphinx in advance, the build rules do that for you. Once built, the html documents live in ''Doc/build/html''.

At very least, whenever you make a change to core, you should update the ''Misc/NEWS'' file to add a brief description of your change, and also add your name to ''Misc/ACKS''.

Making a change

Let's assume for the moment you're about to provide a patch to fix a bug from the python bug tracker. Most fixes will require the following minimal changes:

  • Updated Python module
  • Updated documentation (At least an entry in the NEWS file)
  • Updated Tests (you will update the tests)

In a few cases you also will need to update the C code. After you've done the initial check out of the branch you'll be working on, and you've confirmed the build and tests pass on your machine, you should be set to make your changes locally, apply any patches you are testing, etc.

When you're updating or adding new tests you need to drop into the ''Lib/test'' directory and find the "best place" for the test. Typically, if you're making a bug fix, you're simply going to append the test onto the suite for the module. Larger scale changes, including creating new packages or modules, will need their own ''test_*.py'' file in ''Lib/test''.

It's important when you're adding tests that your tests are clear, well documented, and most of all smart. They will need to know when not to run (say, a network test should not run when no network is present) and they need to be reliable (i.e.: they should never just hang). The tests and code you submit will be viewed by many people, and compiled and tested on more platforms than most of us have ever used. The smarter you make the test, the better off everyone will be.

An important tool in the test developer's arsenal is the ''test_support'' library included in ''Lib/test/test_support.py''. In it you will find a variety of functions, exceptions, and tools to help you to write core tests. Most of all, look at the other tests!

Once your changes work, you should run a ''make check'' to perform some housekeeping operations you want to do prior to generating the diff. These include fixing whitespace, checking the NEWS/ACKS file for updates, and reminding you to run the test suite! See ''Tools/scripts/patchcheck.py'' for everything ''make check'' does.

On Code Bombs

It's important to avoid making widespread changes in a vacuum. Large scale refactoring or changes to an API used by a lot of the standard library should be reviewed carefully and often. Typically, it's better to post an initial patch up on the bug tracker and then revise it as other people/contributors make comments than to drop a huge patch on everyone and say "it's done".

A recent python-dev post from Guido highlighted this issue, the take-away quote (from both his email, and the blog post he linked to) being: "The story's main moral: submit your code for review early and often; work in a branch if you need to, but don't hide your code from review in a local repository until it's 'perfect'." For more details, see the "Code Bombs" thread listed in Related Links above.

One of the tools at your disposal for publishing patches for review is Rietveld, the review application created by Guido Van Rossum. Typically, if you have a small enough change, putting a patch in the bug tracker is sufficient.

How do you generate a patch, big or small? It's easy: cd into your ''trunk/'' directory and run ''svn diff >mychange.patch''. This will create a patch containing only your changes which can then be uploaded to the bug tracker, emailed to the community, etc.

Applying the patch is also easy. Just hop into the ''trunk/'' directory and run ''patch -p0

Conclusion

A good first step to contributing to core is to consult the bug tracker. There you can find everything from mind-melting interpreter issues to simple one-line fixes (famous last words). There's even a query to find "Easy" issues (see the sidebar on bugs.python.org).

One great thing about Python development is that anyone can propose an idea. Should it stand on it's own merit, it will probably be accepted. So even if you don't find a bug in an area you're passionate about, why not find something you are interested in and make a Python Enhancement Proposal for the change? Publish it to python-dev and put together the patch for the code. You can do this for existing modules or even new ones.

Ultimately, Python is your language. Without the people constantly contributing to core in the form of bug fixes, documentation and new programming concepts, Python would simply die on the vine. The more help, the better the language becomes, and the wider the appeal and audience.

Related Links