Get with the program as contextmanager | Completely Different

by jesse in , ,


One of the cooler features that came with Python 2.5's release is the 'with' statement and the context manager protocol behind it. I could make the argument that these two things alone make the upgrade to Python 2.5 more than compelling for those of you trapped in the dark ages of 2.4 or worse: 2.3!

This is a reprint of an article I wrote for Python Magazine as a Completely Different column that was published in the July 2008 issue. I have republished this in its original form, bugs and all

Introduction

In Python 2.5, a with_statement hook was added to the ''__future__'' module . This was brought on by PEP (Python Enhancement Proposal) 343, "The with statement". PEP 343, like many PEPs in Python, was a fusion of good ideas into a rather elegant solution. See http://www.python.org/dev/peps/ for a complete listing of PEPs, including those referenced in this article.

Two of the influencing PEPs, 310 (Reliable Acquisition/Release Pairs) and 319 (Python Synchronize/Asynchronize Block) were primarily focused on a system to add a simple method of acquiring and then releasing a lock. PEP 310 proposed the ''with'' statement (i.e., ''with lock:'') and PEP 319 proposed ''synchronized'' and ''asynchronize'' keywords that would allow you to define an function or method that would use the proposed keywords to access and modify shared objects, essentially hiding the common form of managing the lock directly:

initialize_lock()
...
acquire_lock()
try:
    change_shared_data()
finally:
    release_lock()

While both PEPs 310 and 319 were (are) good ideas, there were additional influences from other PEPs as well. PEP 340, "Anonymous Block Statements", and PEP 346, "User Defined ('with') Statements", by Nick Coghlan were both important. In the end, what I think is an elegant and powerful middle ground was reached.

If you want a very detailed overview of all of the reasoning behind the introduction of the with statement, I recommend reading PEP 346 http://www.python.org/dev/peps/pep-0346/, where Nick Coghlan explains it in excellent detail with many examples.

Context Managers

The key thing to understand about ''with'' and all of the work in the PEP is that under the covers, when you write:

with EXPRESSION [as VARIABLE]:
    BLOCK OF CODE

The EXPRESSION is expanded into two calls. The first call is to the ''__enter__()'' method on the object. After the nested block completes, the object's ''__exit__()'' method is run. "as VARIABLE" is in brackets because it is an optional argument to the expression to store the return value of EXPRESSION to the BLOCK as VARIABLE name.

Take a look at Listing 1 for an example. In order to illustrate the methods and call order, I've created a simple class, Foo, that defines the required protocol methods. At the bottom of the listing. When an instance of Foo is used in the ''with Foo()'' call, the output is simply:

I like turtles

Listing 1:

from __future__ import with_statement

class Foo(object):
    def __init__(self):
        pass
    def __enter__(self):
        print "I"
    def __exit__(self, type, value, traceback):
        print "turtles"

with Foo():
    print "like"

As you can see, the ''__enter__()'' method is called on the object, control is released and the "print turtles" code block is executed. Once the block is completed, the ''__exit__()'' method is called.

Per the PEP, the ''__enter__()'' method on the object accepts no arguments, but can perform actions (in this case, print) or return data. If an object has no data to return it should return self, although that is not required.

The ''__exit__()'' method on the object has to accept three arguments: type, value, and traceback, these correspond to the arguments to the ''raise'' statement. These arguments are passed in because the context manager handles all exceptions during ''__exit__()''. For example, if type is ''None'' then that indicates that the nested block executed successfully, without error. Otherwise the ''__exit__()'' method can properly handle the exception condition and clean up the resource.

For example, you might ask what happens to the ''__exit__()'' method execution if an exception is raised when the code block is executing. Let's examine this further by changing the bottom part of Listing 1 to be:

with Foo():
    raise Exception

The output now looks like this:

I
turtles
Traceback (most recent call last):
  File "scratch.py", line 12, in 
    raise Exception
Exception

If the code block being executed raises an exception, ''__exit__()'' is still called on the Foo() object. This makes it darn handy for, say, cleaning up locks, database handles, sockets, unruly children, etc. Early I mentioned that objects that define the new protocol could also return ''self'', which would then be packed into the variable defined in the [as VARIABLE].

Listing 2 provides a class with an ''__enter__()'' method that returns the instance of the object for access by the code block. In the example, the instance of the object is associated with the variable name "baz". Take a look at the output:

setting count to 0 <__main__.Foo object at 0x73bb0> count is now: 4

Listing 2:

from __future__ import with_statement

class Foo(object):
    def __init__(self):
        pass

    def __enter__(self):
        print "setting count to 0"
        self.count = 0
        return self

    def __exit__(self, type, value, traceback):
        print "count is now: %d" % self.count

    def incr(self):
        self.count += 1

with Foo() as baz:
    print baz
    for i in range(4):
        baz.incr()

As you can see, within the for-loop in the main block of code we were able to alter the state of the object we're reliant on. We can access all of it's internals, change state, call methods, etc. Again, this is especially handy if you want to create something that acts as some sort of handle.

Let's look at two snippets, the old way of declaring a lock, then later acquiring it to modify state:

lock = RLock()

class thread_object(Thread):
    def run(self):
        lock.acquire()
        try:
            print self.getName()
        except:
            raise Exception("Something is broken")
        finally:
            lock.release()

Now, let's look at code refactored to use ''with'':

lock = RLock()

class thread_object(Thread):
    def run(self):
        with lock:
            print self.getName()

This is possible because threading.RLock implements the new context manager protocol, go ahead, take a peek at threading.py yourself or look at the code below:

class _RLock(_Verbose):
    __enter__ = acquire
    ...snip...
    def __exit__(self, t, v, tb):
        self.release()

The lock management classes are not the only ones to implement the protocol. The io.py, tempfile.py, and other modules all implement the protocol to allow you do do something like the following:

with open("hey", "r") as mfile:
    mfile.readlines()

This will automatically open, and close the file on the way in and way out. Magic! Obviously, the simple way of thinking of these is as resource managers. For example, what if you wanted to ensure a given state was set for a particular code block? PEP 346 points out an excellent example of disabling signals during the BLOCK execution. Take a look at Listing 3 where I have implemented that very code to simply catch and ignores SIGABRT signals.

When the script is run in one window, and in another we start running "kill -6 ", we see:

Tis but a scratch!
Tis but a scratch!
I got an abort, but I like it here.
Tis but a scratch!
Tis but a scratch!

Listing 3:

from __future__ import with_statement
from contextlib import contextmanager
import signal

def handler(signum, frame):
    print "I got an abort, but I like it here."
    pass

@contextmanager
def no_sigabort():
    signal.signal(signal.SIGABRT, handler)
    yield
    signal.signal(signal.SIGABRT, signal.SIG_DFL)

with no_sigabort():
    # code executed without worrying about signals
    while True:
        print "Tis but a scratch!"

Instead of passing in the handler function on line 12 we could also pass in signal.SIG_IGN - which just makes the signal ignored. You can easily catch all sorts of state and react to it. Another one of the examples in PEP 346 is committing or rolling back database transactions:

def transaction(db):
    try:
        yield
    except:
        db.rollback()
    else:
        db.commit()

Using this style, your code becomes a lot more succinct, clear, and you drastically reduce the amount of boilerplate you have to add to your application.

Contextlib

As part of Python 2.5 a new module ''contextlib'' was introduced. This module is an excellent reference point of how to use context managers (it's great example code!). It also provides some pretty cool tools. You've already seen me use contextlib.contextmanager to remove the need to define an object with ''__enter__()'' and ''__exit__()'' methods on the last example.

The contextlib.contextmanager decorator allows you to create nice user statements out of a simple function that yields at one point in the middle. This means you could do:

@contextmanager
def test_setup():
    start database...
    inject fake data...
    yield (to test)
    confirm result...
    shut database down...

Which allows you to:

def mytest():
    with test_setup():
        ... do stuff ...

You can technically do anything else you want within that decorated function, and it can take as long as you want as long as:

- It yields once. - It does not yield again after an exception is raised.

The other nice thing is that you could change the test_setup example above to accept any number or type of arguments, so tests could pass identity and other information into the test_setup function.

Now let's turn this up to 11. Up until now, I've shown you simple examples - basically, how to get/set some resource and then release it. But did you know you could nest them? Via the contextlib.nested function, you can define a series of nested contextmanagers and then bind each one to a different variable name.

Let's try a simple nested context out for starters. In the first example in Listing 4, we want to move the data from file1 to file2. It's easy to list the open file handles as arguments to ''nested()'', but what about mixing types? The second example in Listing 4 (lines 8-11) mixes file handles with thread locks.

Listing 4:

#!/usr/bin/env python
from __future__ import with_statement
from contextlib import nested

with nested(open("file1", "r"), open("file2", "w")) as (a, b):
    b.write(a.read())

from threading import RLock
lock = RLock()
with nested(lock, open("file1", "r"), open("file2", "w")) as (a, b, c):
    c.write(b.read())

Yes, we have officially crossed into maybe that's too much territory. But, you can see we can pass in any number of contextmanagers and all of them will be handled as needed. This is great if, like above, you need to acquire a lock and then perform an action which requires some cleanup.

Finally, we have contextlib.closing. This is, as the documentation states, "a context manager that closes thing upon completion of the block". Anything with a ''close()'' method is eligible to be used here. At last count on trunk, ''close()'' occured at least 71 times in the Lib directory. You can use ''closing'' on URLs from urllib, StringIO objects, as well as gzip objects.

For example, from the standard library documentation:

from __future__ import with_statement
from contextlib import closing
import urllib

url = 'http://www.python.org'
with closing(urllib.urlopen(url)) as page:
    for line in page:
        print line

All three of these make it easy to factor-out code which we all end up repeating; that's the nature of boilerplate. As we all know, less boilerplate and copy and pasted code means easier to read, and easier to manage.

Let's Go Off-Roading

As I was writing this, I was trying to think of something really interesting to do with an object defining ''__enter__()'' and ''__exit__()'' methods that wasn't just resource management. Then I realized, given I'm doing a lot of parallel stuff right now, I could create a threadpool that allowed jobs to be submitted to it, and the ''__exit__()'' would call ''join()'' on the threads and so on.

Fantastic idea! Within Listing 5, I have defined a basic thread object that subclasses threading.Thread. Then, in Listing 6 I define a ThreadPool, which is the context manager I will use.

Listing 5:

from __future__ import with_statement
from threading import Thread
from Queue import Empty
from Listing6 import ThreadPool

class myThread(Thread):
    def __init__(self, myq):
        Thread.__init__(self)
        self.myq = myq
    def run(self):
        while True:
            try:
                job = self.myq.get()
                if job == 'STOP':
                    break
                print self.getName(), job
            except Empty:
                continue

with ThreadPool(10, myThread) as pool:
    for i in range(100):
        pool.put(i)

Listing 6:

from Queue import Queue

class ThreadPool(object):
    def __init__(self, workers, workerClass):
        self.myq = Queue()
        self.workers = workers
        self.workerClass = workerClass
        self.pool = []

    def __enter__(self):
        # On entering, start all the workers, who will block trying to
        # get work off the queue
        for i in range(self.workers):
            self.pool.append(self.workerClass(self.myq))
        for i in self.pool:
            i.start()
        return self.myq

    def __exit__(self, type, value, traceback):
        # Now, shut down the pool once all work is done
        for i in self.pool:
            self.myq.put('STOP')
        for i in self.pool:
            i.join()

Note that ThreadPool returns a value from __enter__(). After it builds up the worker-pool, instead of returning ''self'' (which would be silly), it actually returns the queue built in the constructor. This makes it so that when we call it on line 20 in Listing 5, we get the reference to the queue we need.

Now, this is a nominal example. We're not returning any results or anything, we're just printing the numbers off of the queue as we get them. But it demonstrates the concept of creating an object that tracks some state, sets up a resource, and then ultimately manages that resource.

In Listing 6, I made sure we built the pool at ''__enter__()'' time rather than in the constructor because what happens if we need to do more customization or hit an exception? If we do hit an exception, we will immediately jump out and the BLOCK we're running will not be executed. In the ''__exit__()'' method, I insert STOP tokens to tell the threads to exit their work loop.

If you wanted, you could use this code inside of your own application (once you make it so it returns data to the caller) to spawn worker pools on-demand, do some processing, and then cleanly shut them down with a minimal amount of boilerplate involved.

The nice thing about this is that all of the responsibility for management is done in the object that does all of the work itself. There is no more needing to remember to shut down the worker pool, release the database connection, or close that socket.

Conclusion

I hope I've shown you a compelling new feature within Python that you might not have known about. Python is evolving rapidly every day. We don't just have things like context managers and Python 3000 to look forward to. We have a wealth of improvements going into core every single day.

I think people are going to really love context managers for their elegance, once they become mainstream to the language (in 2.6). Centralizing the control and management of state, resources and other-like things while reducing the total lines of code you have to debug, manage and read is a good thing.

Well, as long as the end result is still readable.

Related Links: