Get with the program as contextmanager | Completely Different

February 3rd, 2009 § 5 comments

One of the cooler fea­tures that came with Python 2.5’s release is the ‘with’ state­ment and the con­text man­ager pro­to­col behind it. I could make the argu­ment that these two things alone make the upgrade to Python 2.5 more than com­pelling for those of you trapped in the dark ages of 2.4 or worse: 2.3!

This is a reprint of an arti­cle I wrote for Python Mag­a­zine as a Com­pletely Dif­fer­ent col­umn that was pub­lished in the July 2008 issue. I have repub­lished this in its orig­i­nal form, bugs and all

Intro­duc­tion

In Python 2.5, a with_statement hook was added to the ”__future__” mod­ule . This was brought on by PEP (Python Enhance­ment Pro­posal) 343, “The with state­ment”. PEP 343, like many PEPs in Python, was a fusion of good ideas into a rather ele­gant solu­tion. See http://www.python.org/dev/peps/ for a com­plete list­ing of PEPs, includ­ing those ref­er­enced in this article.

Two of the influ­enc­ing PEPs, 310 (Reli­able Acquisition/Release Pairs) and 319 (Python Synchronize/Asynchronize Block) were pri­mar­ily focused on a sys­tem to add a sim­ple method of acquir­ing and then releas­ing a lock. PEP 310 pro­posed the ”with” state­ment (i.e., ”with lock:”) and PEP 319 pro­posed ”syn­chro­nized” and ”asyn­chro­nize” key­words that would allow you to define an func­tion or method that would use the pro­posed key­words to access and mod­ify shared objects, essen­tially hid­ing the com­mon form of man­ag­ing the lock directly:

?View Code PYTHON
1
2
3
4
5
6
7
initialize_lock()
...
acquire_lock()
try:
    change_shared_data()
finally:
    release_lock()

While both PEPs 310 and 319 were (are) good ideas, there were addi­tional influ­ences from other PEPs as well. PEP 340, “Anony­mous Block State­ments”, and PEP 346, “User Defined (‘with’) State­ments”, by Nick Cogh­lan were both impor­tant. In the end, what I think is an ele­gant and pow­er­ful mid­dle ground was reached.

If you want a very detailed overview of all of the rea­son­ing behind the intro­duc­tion of the with state­ment, I rec­om­mend read­ing PEP 346 http://www.python.org/dev/peps/pep-0346/, where Nick Cogh­lan explains it in excel­lent detail with many examples.

Con­text Managers

The key thing to under­stand about ”with” and all of the work in the PEP is that under the cov­ers, when you write:

?View Code PYTHON
1
2
with EXPRESSION [as VARIABLE]:
    BLOCK OF CODE

The EXPRESSION is expanded into two calls. The first call is to the ”__enter__()” method on the object. After the nested block com­pletes, the object’s ”__exit__()” method is run. “as VARIABLE” is in brack­ets because it is an optional argu­ment to the expres­sion to store the return value of EXPRESSION to the BLOCK as VARIABLE name.

Take a look at List­ing 1 for an exam­ple. In order to illus­trate the meth­ods and call order, I’ve cre­ated a sim­ple class, Foo, that defines the required pro­to­col meth­ods. At the bot­tom of the list­ing. When an instance of Foo is used in the ”with Foo()” call, the out­put is simply:


I
like
turtles

List­ing 1:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
from __future__ import with_statement
 
class Foo(object):
    def __init__(self):
        pass
    def __enter__(self):
        print "I"
    def __exit__(self, type, value, traceback):
        print "turtles"
 
with Foo():
    print "like"

As you can see, the ”__enter__()” method is called on the object, con­trol is released and the “print tur­tles” code block is exe­cuted. Once the block is com­pleted, the ”__exit__()” method is called.

Per the PEP, the ”__enter__()” method on the object accepts no argu­ments, but can per­form actions (in this case, print) or return data. If an object has no data to return it should return self, although that is not required.

The ”__exit__()” method on the object has to accept three argu­ments: type, value, and trace­back, these cor­re­spond to the argu­ments to the ”raise” state­ment. These argu­ments are passed in because the con­text man­ager han­dles all excep­tions dur­ing ”__exit__()”. For exam­ple, if type is ”None” then that indi­cates that the nested block exe­cuted suc­cess­fully, with­out error. Oth­er­wise the ”__exit__()” method can prop­erly han­dle the excep­tion con­di­tion and clean up the resource.

For exam­ple, you might ask what hap­pens to the ”__exit__()” method exe­cu­tion if an excep­tion is raised when the code block is exe­cut­ing. Let’s exam­ine this fur­ther by chang­ing the bot­tom part of List­ing 1 to be:

?View Code PYTHON
1
2
with Foo():
    raise Exception

The out­put now looks like this:

?View Code PYTHON
1
2
3
4
5
6
I
turtles
Traceback (most recent call last):
  File "scratch.py", line 12, in <module>
    raise Exception
Exception

If the code block being exe­cuted raises an excep­tion, ”__exit__()” is still called on the Foo() object. This makes it darn handy for, say, clean­ing up locks, data­base han­dles, sock­ets, unruly chil­dren, etc. Early I men­tioned that objects that define the new pro­to­col could also return ”self”, which would then be packed into the vari­able defined in the [as VARIABLE].

List­ing 2 pro­vides a class with an ”__enter__()” method that returns the instance of the object for access by the code block. In the exam­ple, the instance of the object is asso­ci­ated with the vari­able name “baz”. Take a look at the output:


setting count to 0
<__main__.Foo object at 0x73bb0>
count is now: 4

List­ing 2:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from __future__ import with_statement
 
class Foo(object):
    def __init__(self):
        pass
 
    def __enter__(self):
        print "setting count to 0"
        self.count = 0
        return self
 
    def __exit__(self, type, value, traceback):
        print "count is now: %d" % self.count
 
    def incr(self):
        self.count += 1
 
with Foo() as baz:
    print baz
    for i in range(4):
        baz.incr()

As you can see, within the for-loop in the main block of code we were able to alter the state of the object we’re reliant on. We can access all of it’s inter­nals, change state, call meth­ods, etc. Again, this is espe­cially handy if you want to cre­ate some­thing that acts as some sort of handle.

Let’s look at two snip­pets, the old way of declar­ing a lock, then later acquir­ing it to mod­ify state:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
lock = RLock()
 
class thread_object(Thread):
    def run(self):
        lock.acquire()
        try:
            print self.getName()
        except:
            raise Exception("Something is broken")
        finally:
            lock.release()

Now, let’s look at code refac­tored to use ”with”:

?View Code PYTHON
1
2
3
4
5
6
lock = RLock()
 
class thread_object(Thread):
    def run(self):
        with lock:
            print self.getName()

This is pos­si­ble because threading.RLock imple­ments the new con­text man­ager pro­to­col, go ahead, take a peek at threading.py your­self or look at the code below:

?View Code PYTHON
1
2
3
4
5
class _RLock(_Verbose):
    __enter__ = acquire
    ...snip...
    def __exit__(self, t, v, tb):
        self.release()

The lock man­age­ment classes are not the only ones to imple­ment the pro­to­col. The io.py, tempfile.py, and other mod­ules all imple­ment the pro­to­col to allow you do do some­thing like the following:

?View Code PYTHON
1
2
with open("hey", "r") as mfile:
    mfile.readlines()

This will auto­mat­i­cally open, and close the file on the way in and way out. Magic! Obvi­ously, the sim­ple way of think­ing of these is as resource man­agers. For exam­ple, what if you wanted to ensure a given state was set for a par­tic­u­lar code block? PEP 346 points out an excel­lent exam­ple of dis­abling sig­nals dur­ing the BLOCK exe­cu­tion. Take a look at List­ing 3 where I have imple­mented that very code to sim­ply catch and ignores SIGABRT signals.

When the script is run in one win­dow, and in another we start run­ning “kill –6 ”, we see:

Tis but a scratch!
Tis but a scratch!
I got an abort, but I like it here.
Tis but a scratch!
Tis but a scratch!

List­ing 3:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from __future__ import with_statement
from contextlib import contextmanager
import signal
 
def handler(signum, frame):
    print "I got an abort, but I like it here."
    pass
 
@contextmanager
def no_sigabort():
    signal.signal(signal.SIGABRT, handler)
    yield
    signal.signal(signal.SIGABRT, signal.SIG_DFL)
 
with no_sigabort():
    # code executed without worrying about signals
    while True:
        print "Tis but a scratch!"

Instead of pass­ing in the han­dler func­tion on line 12 we could also pass in signal.SIG_IGN — which just makes the sig­nal ignored. You can eas­ily catch all sorts of state and react to it. Another one of the exam­ples in PEP 346 is com­mit­ting or rolling back data­base transactions:

?View Code PYTHON
1
2
3
4
5
6
7
def transaction(db):
    try:
        yield
    except:
        db.rollback()
    else:
        db.commit()

Using this style, your code becomes a lot more suc­cinct, clear, and you dras­ti­cally reduce the amount of boil­er­plate you have to add to your application.

Con­textlib

As part of Python 2.5 a new mod­ule ”con­textlib” was intro­duced. This mod­ule is an excel­lent ref­er­ence point of how to use con­text man­agers (it’s great exam­ple code!). It also pro­vides some pretty cool tools. You’ve already seen me use contextlib.contextmanager to remove the need to define an object with ”__enter__()” and ”__exit__()” meth­ods on the last example.

The contextlib.contextmanager dec­o­ra­tor allows you to cre­ate nice user state­ments out of a sim­ple func­tion that yields at one point in the mid­dle. This means you could do:

?View Code PYTHON
1
2
3
4
5
6
7
@contextmanager
def test_setup():
    start database...
    inject fake data...
    yield (to test)
    confirm result...
    shut database down...

Which allows you to:

?View Code PYTHON
1
2
3
def mytest():
    with test_setup():
        ... do stuff ...

You can tech­ni­cally do any­thing else you want within that dec­o­rated func­tion, and it can take as long as you want as long as:

- It yields once.
– It does not yield again after an excep­tion is raised.

The other nice thing is that you could change the test_setup exam­ple above to accept any num­ber or type of argu­ments, so tests could pass iden­tity and other infor­ma­tion into the test_setup function.

Now let’s turn this up to 11. Up until now, I’ve shown you sim­ple exam­ples — basi­cally, how to get/set some resource and then release it. But did you know you could nest them? Via the contextlib.nested func­tion, you can define a series of nested con­textman­agers and then bind each one to a dif­fer­ent vari­able name.

Let’s try a sim­ple nested con­text out for starters. In the first exam­ple in List­ing 4, we want to move the data from file1 to file2. It’s easy to list the open file han­dles as argu­ments to ”nested()”, but what about mix­ing types? The sec­ond exam­ple in List­ing 4 (lines 8–11) mixes file han­dles with thread locks.

List­ing 4:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
#!/usr/bin/env python
from __future__ import with_statement
from contextlib import nested
 
with nested(open("file1", "r"), open("file2", "w")) as (a, b):
    b.write(a.read())
 
from threading import RLock
lock = RLock()
with nested(lock, open("file1", "r"), open("file2", "w")) as (a, b, c):
    c.write(b.read())

Yes, we have offi­cially crossed into maybe that’s too much ter­ri­tory. But, you can see we can pass in any num­ber of con­textman­agers and all of them will be han­dled as needed. This is great if, like above, you need to acquire a lock and then per­form an action which requires some cleanup.

Finally, we have contextlib.closing. This is, as the doc­u­men­ta­tion states, “a con­text man­ager that closes thing upon com­ple­tion of the block”. Any­thing with a ”close()” method is eli­gi­ble to be used here. At last count on trunk, ”close()” occured at least 71 times in the Lib direc­tory. You can use ”clos­ing” on URLs from url­lib, Strin­gIO objects, as well as gzip objects.

For exam­ple, from the stan­dard library documentation:

?View Code PYTHON
1
2
3
4
5
6
7
8
from __future__ import with_statement
from contextlib import closing
import urllib
 
url = 'http://www.python.org'
with closing(urllib.urlopen(url)) as page:
    for line in page:
        print line

All three of these make it easy to factor-out code which we all end up repeat­ing; that’s the nature of boil­er­plate. As we all know, less boil­er­plate and copy and pasted code means eas­ier to read, and eas­ier to manage.

Let’s Go Off-Roading

As I was writ­ing this, I was try­ing to think of some­thing really inter­est­ing to do with an object defin­ing ”__enter__()” and ”__exit__()” meth­ods that wasn’t just resource man­age­ment. Then I real­ized, given I’m doing a lot of par­al­lel stuff right now, I could cre­ate a thread­pool that allowed jobs to be sub­mit­ted to it, and the ”__exit__()” would call ”join()” on the threads and so on.

Fan­tas­tic idea! Within List­ing 5, I have defined a basic thread object that sub­classes threading.Thread. Then, in List­ing 6 I define a Thread­Pool, which is the con­text man­ager I will use.

List­ing 5:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from __future__ import with_statement
from threading import Thread
from Queue import Empty
from Listing6 import ThreadPool
 
class myThread(Thread):
    def __init__(self, myq):
        Thread.__init__(self)
        self.myq = myq
    def run(self):
        while True:
            try:
                job = self.myq.get()
                if job == 'STOP':
                    break
                print self.getName(), job
            except Empty:
                continue
 
with ThreadPool(10, myThread) as pool:
    for i in range(100):
        pool.put(i)

List­ing 6:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from Queue import Queue
 
class ThreadPool(object):
    def __init__(self, workers, workerClass):
        self.myq = Queue()
        self.workers = workers
        self.workerClass = workerClass
        self.pool = []
 
    def __enter__(self):
        # On entering, start all the workers, who will block trying to
        # get work off the queue
        for i in range(self.workers):
            self.pool.append(self.workerClass(self.myq))
        for i in self.pool:
            i.start()
        return self.myq
 
    def __exit__(self, type, value, traceback):
        # Now, shut down the pool once all work is done
        for i in self.pool:
            self.myq.put('STOP')
        for i in self.pool:
            i.join()

Note that Thread­Pool returns a value from __enter__(). After it builds up the worker-pool, instead of return­ing ”self” (which would be silly), it actu­ally returns the queue built in the con­struc­tor. This makes it so that when we call it on line 20 in List­ing 5, we get the ref­er­ence to the queue we need.

Now, this is a nom­i­nal exam­ple. We’re not return­ing any results or any­thing, we’re just print­ing the num­bers off of the queue as we get them. But it demon­strates the con­cept of cre­at­ing an object that tracks some state, sets up a resource, and then ulti­mately man­ages that resource.

In List­ing 6, I made sure we built the pool at ”__enter__()” time rather than in the con­struc­tor because what hap­pens if we need to do more cus­tomiza­tion or hit an excep­tion? If we do hit an excep­tion, we will imme­di­ately jump out and the BLOCK we’re run­ning will not be exe­cuted. In the ”__exit__()” method, I insert STOP tokens to tell the threads to exit their work loop.

If you wanted, you could use this code inside of your own appli­ca­tion (once you make it so it returns data to the caller) to spawn worker pools on-demand, do some pro­cess­ing, and then cleanly shut them down with a min­i­mal amount of boil­er­plate involved.

The nice thing about this is that all of the respon­si­bil­ity for man­age­ment is done in the object that does all of the work itself. There is no more need­ing to remem­ber to shut down the worker pool, release the data­base con­nec­tion, or close that socket.

Con­clu­sion

I hope I’ve shown you a com­pelling new fea­ture within Python that you might not have known about. Python is evolv­ing rapidly every day. We don’t just have things like con­text man­agers and Python 3000 to look for­ward to. We have a wealth of improve­ments going into core every sin­gle day.

I think peo­ple are going to really love con­text man­agers for their ele­gance, once they become main­stream to the lan­guage (in 2.6). Cen­tral­iz­ing the con­trol and man­age­ment of state, resources and other-like things while reduc­ing the total lines of code you have to debug, man­age and read is a good thing.

Well, as long as the end result is still readable.

Related Links:

  • Ed Page

    The con­cept behind the design of contextlib.contextmanager changed the way I write Python. I don’t do this too much but I found the con­cept of split­ting a func­tion up with yields a great way to take a process that is fairly lin­ear but due to language/library design gets split up and becomes harder to read.

    I’ve cre­ated sub­modes for a com­mand inter­preter that are man­aged by a class that is cre­ated when dec­o­rat­ing a yield-ing func­tion to have setup and tear down code.

    I’ve cre­ated dec­o­ra­tors to work with PyGTK’s idle han­dler for coop­er­a­tive thread­ing. So instead of hav­ing sep­a­rate func­tions for each step of the idle pro­cess­ing or using a hand made statema­chine just to be in the cor­rect sec­tion of code, you just writ your algo­rithm nor­mally and insert yields when you want to hand con­trol back to GTK

    I’m work­ing on one (untested) that will let you do your PyGTK setup code in the UI thread, back­ground pro­cess­ing in a ran­dom thread, and then cleanup code in the UI thread.

    Oh, and another good exam­ple of the with state­ment beyond resource work is a python recipe for unit test­ing that I saw. Basi­cally you do “with expected(ExceptionType)” and if that excep­tion isn’t thrown, the test will fail.

  • Loris

    very good arti­cle!
    thank of lot

  • Yassen Damyanov

    A very help­ful arti­cle! Thanks a lot, Jessen.

  • bestchai

    This makes it so that when we call it on line 20 in List­ing 5, we get the ref­er­ence to the queue we need.” should be line 18.

  • bestchai

    This makes it so that when we call it on line 20 in List­ing 5, we get the ref­er­ence to the queue we need.” should be line 18.

What's this?

You are currently reading Get with the program as contextmanager | Completely Different at jessenoller.com.

meta