<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jessenoller.com &#187; python magazine</title>
	<atom:link href="http://jessenoller.com/category/python-magazine/feed/" rel="self" type="application/rss+xml" />
	<link>http://jessenoller.com</link>
	<description>python, programming and other things</description>
	<lastBuildDate>Wed, 11 Jan 2012 19:01:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>YAML ain’t Markup Language &#124; Completely Different</title>
		<link>http://jessenoller.com/2009/04/13/yaml-aint-markup-language-completely-different/</link>
		<comments>http://jessenoller.com/2009/04/13/yaml-aint-markup-language-completely-different/#comments</comments>
		<pubDate>Mon, 13 Apr 2009 15:28:13 +0000</pubDate>
		<dc:creator>jesse</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[python magazine]]></category>

		<guid isPermaLink="false">http://jessenoller.com/?p=581</guid>
		<description><![CDATA[When someone says "pick a markup language," most people would immediately respond with "XML!", but there's an alternative out there. YAML is human-readable, easy to use, and overall quite fantastic.

This is a reprint of an article I wrote for Python Magazine as a Completely Different column that was published in the December 2008 issue. ...]]></description>
			<content:encoded><![CDATA[<blockquote><p>When someone says “pick a markup language,” most people would immediately respond with “XML!”, but there’s an alternative out there. YAML is human-readable, easy to use, and overall quite fantastic.</p></blockquote>
<p><em>This is a reprint of an article I wrote for <a href="http://www.pythonmagazine.com/" target="_blank">Python Magazine</a> as a Completely Different column that was published in the December 2008 issue. <b>I have republished this in its original form, bugs and all</b></em></p>
<p><span id="more-581"></span><br />
I <em>hate</em> markup languages. There, I said it. The first time I had the pleasure of “using” (being abused by) XML, I said to myself “there has got to be a better way of doing this.” Well, after years of sticking with plain text ini files and custom syntaxes based off of using ”eval()”, I’ve come to not only use, but love, YAML.</p>
<p>YAML, or “YAML Ain’t Markup Language”, is “a human friendly data serialization standard for all programming languages.” It has the advantage of leaning towards dynamic languages a la Python, Ruby, etc.</p>
<p>It is important to note that friendliness and readability are very core to the design of YAML. The number of format characters is very low and, like Python, YAML’s markup can use whitespace to indicate scoping of items. Tabs are not allowed, so there is no chance for confusion about indention level. Additionally, the constructs within YAML such as mappings, sequences, and scalars all mesh nicely with existing Python data types like dictionaries, lists, strings, and integers. It’s also fully unicode-enabled, which should make happy a lot of people who are normally worried about UTF-8.</p>
<p>What really attracted me to YAML are some of the key things that drew me to Python: cleanliness and approachability. Too often, I’ve had to deal with monstrous XML files for data passing or — worse yet — configuration and sometimes ini-style configuration files that simply don’t scale, or communicate enough information. So far, I’ve used YAML in about six different projects with great success and found that it scales quite well while staying human-readable.</p>
<h2>Syntax is Key</h2>
<p>YAML, on its face, is amazingly simple. Take the code below, for example. Run through the <em>pyyaml</em> <em>load</em> function (more on PyYAML in a moment):</p>
<pre>
 # YAML
name: Jesse
</pre>
<p>This YAML will get the following Python dictionary:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code1'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5811"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p581code1"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">import</span> yaml
<span style="color: #66cc66;">&gt;&gt;&gt;</span> yaml.<span style="color: black;">load</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&quot;&quot;
...  # YAML
... name: Jesse
... &quot;&quot;&quot;</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#123;</span><span style="color: #483d8b;">'name'</span>: <span style="color: #483d8b;">'Jesse'</span><span style="color: black;">&#125;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span></pre></td></tr></table></div>

<p>This is a simple example. Line 1 of the YAML file, or document, is a simple comment. Note that there is a space character right before that # sign. The next line is a simple key value pair which, after being parsed, gets returned to us in a Python dictionary. Simple as pie!</p>
<p>A simple name-value pair is easy to do. Here is a document with some additional structures and details to try:</p>
<pre>
 # YAML
object:
    attributes:
        - attr1
        - attr2
        - attr3
    methods: [ getter, setter ]
</pre>
<p>Here, we have defined a top-level entity named “object”. This object has two block mappings related to it, ”attributes” and ”methods”. The ”attributes” mapping uses the more verbose YAML syntax for a list, in this case:</p>
<pre>
attributes:
    - attr1
    - attr2
    - attr3
</pre>
<p>In this case the YAML represents a key with a name of ”attributes” while each item underneath it, prefaced with a “”-””, represents an item that will appear in a list as a value for that key. Here it is printed after a load:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code2'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5812"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p581code2"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #483d8b;">'object'</span>: <span style="color: black;">&#123;</span><span style="color: #483d8b;">'attributes'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'attr1'</span>, <span style="color: #483d8b;">'attr2'</span>, <span style="color: #483d8b;">'attr3'</span><span style="color: black;">&#93;</span>, ...</pre></td></tr></table></div>

<p>The ”methods” key uses YAML shorthand to accomplish the same thing. In my experience, non-programmers tend to understand the first method, “”-”” prefacing, a bit more than the second method. Both parse to Python lists:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code3'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5813"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p581code3"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #483d8b;">'object'</span>: <span style="color: black;">&#123;</span><span style="color: #483d8b;">'attributes'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'attr1'</span>, <span style="color: #483d8b;">'attr2'</span>,
                           <span style="color: #483d8b;">'attr3'</span><span style="color: black;">&#93;</span>,
            <span style="color: #483d8b;">'methods'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'getter'</span>, <span style="color: #483d8b;">'setter'</span><span style="color: black;">&#93;</span><span style="color: black;">&#125;</span><span style="color: black;">&#125;</span></pre></td></tr></table></div>

<p>I included both examples to illustrate a point. Most of YAML’s syntax has two ways of achieving the same intended goal. There is the verbose, multi-line method, and the more compact method. Both methods are human-readable, so choosing one is a matter of personal preference.</p>
<p>As you can see, the most basic syntax is as follows:</p>
<p><b>dicts/hashes</b>: key, value separated by a colon and space, e.g. ”key: value”; additionally, you can use ”{key: value}”</p>
<p><b>lists</b>: dash followed by a space then the item, e.g. ”- item”; additionally, you can use ”[item, item, item]”</p>
<p>Strings do not require quotation. You can preserve line breaks with the ”|” character; for example:</p>
<pre>
 # YAML
sonnet: |
    I wish I could
    write a poem
    but I can't
</pre>
<p>This would parse to:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code4'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5814"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p581code4"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #483d8b;">'sonnet'</span>: <span style="color: #483d8b;">&quot;I wish I could<span style="color: #000099; font-weight: bold;">\n</span>write a poem<span style="color: #000099; font-weight: bold;">\n</span>but I can't<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: black;">&#125;</span></pre></td></tr></table></div>

<p>Trailing and preceding whitespace is trimmed out in the basic use case of ”|”. See the “Scalar indicators” section of the compact cheat sheet for modifiers to the ”|” character.</p>
<p>Core to YAML is the concept of <b>documents</b>. A document is not just a separate file in this case. Instead, think of a document as just a chunk of YAML. You can have multiple documents in a single stream of YAML, if each one is separated by ”—”, like:</p>
<pre>
 # YAML
---
document: this is doc 1
---
document: this is doc 2
...
</pre>
<p>Using an ellipsis explicitly ends a document. The nice thing about documents is you can treat them as different entities. Let’s say, “people” and “cars” are in the same file. You can use them for a bunch of entities that look alike, e.g.:</p>
<pre>
name: SomeObject
attributes:
    - attr1
    - attr2
    - attr3
methods: [ getter, setter ]
---
name: MyPrettyObject
attributes:
    - attr1
    - attr2
    - attr3
methods: [ getter, setter ]
</pre>
<p>which parses to:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code5'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5815"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p581code5"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #483d8b;">'attributes'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'attr1'</span>, <span style="color: #483d8b;">'attr2'</span>, <span style="color: #483d8b;">'attr3'</span><span style="color: black;">&#93;</span>,
 <span style="color: #483d8b;">'methods'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'getter'</span>, <span style="color: #483d8b;">'setter'</span><span style="color: black;">&#93;</span>,
 <span style="color: #483d8b;">'name'</span>: <span style="color: #483d8b;">'SomeObject'</span><span style="color: black;">&#125;</span>
<span style="color: black;">&#123;</span><span style="color: #483d8b;">'attributes'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'attr1'</span>, <span style="color: #483d8b;">'attr2'</span>, <span style="color: #483d8b;">'attr3'</span><span style="color: black;">&#93;</span>,
 <span style="color: #483d8b;">'methods'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'getter'</span>, <span style="color: #483d8b;">'setter'</span><span style="color: black;">&#93;</span>,
 <span style="color: #483d8b;">'name'</span>: <span style="color: #483d8b;">'MyPrettyObject'</span><span style="color: black;">&#125;</span></pre></td></tr></table></div>

<p>YAML also supports variables, or <b>repeated nodes</b>, which at first didn’t click for me. The simplest explanation is that you define something as a variable by preceding it with ”&amp;NAME value” and you can refer to it with ”*NAME” e.g.:</p>
<pre>
 # YAML
some_thing: &#038;NAME foobar
other_thing: *NAME
</pre>
<p>Parses to:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code6'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5816"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p581code6"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #483d8b;">'other_thing'</span>: <span style="color: #483d8b;">'foobar'</span>, <span style="color: #483d8b;">'some_thing'</span>: <span style="color: #483d8b;">'foobar'</span><span style="color: black;">&#125;</span></pre></td></tr></table></div>

<p>As you can see, the syntax is pretty simple. It’s easy to represent information in a way that is both clear, concise and, well… fun. What’s really cool is the fact it meshes so well with Python!</p>
<p>Note that fans of JSON (JavaScript Object Notation) will quickly realize that the concise-version of the syntax (e.g. using ”[value, value]”) looks a lot like JSON. In fact, for the most part, JSON is a subset of YAML syntax. With a little bit of additional pre-processing you should be able to pass your JSON off as YAML and vice-versa.</p>
<h2>And with that, PyYAML</h2>
<p>After reading the basic of the syntax, you’re jazzed to get started with YAML, right? Well, getting started with YAML is only a single ”easy_install” away.  The **PyYAML** module is pretty much the de-facto parser and emitter for YAML. The core of the module is written in pure Python, but, as of version 3.0.4, it also supports binding to the high-speed LibYAML implementation written in C.</p>
<p>PyYAML is blindingly simple to use for most cases. To generate all of the output I’ve used in the article so far, all I used was:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code7'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5817"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p581code7"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> yaml
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">pprint</span>
<span style="color: #ff7700;font-weight:bold;">for</span> project <span style="color: #ff7700;font-weight:bold;">in</span> yaml.<span style="color: black;">load_all</span><span style="color: black;">&#40;</span><span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'test.yaml'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
    <span style="color: #dc143c;">pprint</span>.<span style="color: #dc143c;">pprint</span><span style="color: black;">&#40;</span>project<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>The ”load_all()” function goes back to the “multiple documents within a stream” concept. In the case above I am assuming that there won’t be just a single document. I am using ”yaml.load_all()”, rather than ”load()”, then iterating over the results. ”yaml.load_all()” returns a generator yielding each document in the stream. The ”yaml.load()” function accepts a string (Unicode or otherwise), or an open file object.</p>
<p>For many cases, you’ll be loading a single document. You might use it for configuration loading:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code8'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5818"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p581code8"><pre class="python" style="font-family:monospace;">configuration = yaml.<span style="color: black;">load</span><span style="color: black;">&#40;</span><span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'test.yaml'</span><span style="color: black;">&#41;</span>.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Of course, one of the other aspects to PyYAML is dumping Python data structures to a YAML file. Take, for example, Listing 1:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code9'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p5819"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code" id="p581code9"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/python</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> yaml
&nbsp;
mydata = <span style="color: black;">&#123;</span><span style="color: #483d8b;">'person'</span> : <span style="color: #483d8b;">'jesse'</span>,
          <span style="color: #483d8b;">'hobby'</span> : <span style="color: #483d8b;">'python'</span>,
          <span style="color: #483d8b;">'employed'</span> : <span style="color: #008000;">True</span>,
          <span style="color: #483d8b;">'limbs'</span>: <span style="color: black;">&#123;</span><span style="color: #483d8b;">'arms'</span> : <span style="color: #ff4500;">2</span>, <span style="color: #483d8b;">'legs'</span> : <span style="color: #ff4500;">2</span><span style="color: black;">&#125;</span>,
          <span style="color: #483d8b;">'family'</span> : <span style="color: black;">&#91;</span><span style="color: #483d8b;">'wife'</span>, <span style="color: #483d8b;">'toddler'</span><span style="color: black;">&#93;</span><span style="color: black;">&#125;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">print</span> yaml.<span style="color: black;">dump</span><span style="color: black;">&#40;</span>mydata<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>In this case, I am constructing a dictionary containing all of the data I want to include in the YAML file. Then I simply call ”yaml.dump()” and the output of Listing 1 looks like well-formed YAML:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code10'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p58110"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p581code10"><pre class="python" style="font-family:monospace;">$ python Listing1.<span style="color: black;">py</span>
employed: true
family: <span style="color: black;">&#91;</span>wife, toddler<span style="color: black;">&#93;</span>
hobby: python
limbs: <span style="color: black;">&#123;</span>arms: <span style="color: #ff4500;">2</span>, legs: <span style="color: #ff4500;">2</span><span style="color: black;">&#125;</span>
person: jesse</pre></td></tr></table></div>

<p>Additionally, PyYAML includes ”yaml.dump_all()”.  It accepts a list of objects to serialize and writes to the target stream. Let’s make Listing 1 handle a series of objects:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code11'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p58111"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p581code11"><pre class="python" style="font-family:monospace;">mydata = <span style="color: black;">&#91;</span> mydata <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span> <span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> yaml.<span style="color: black;">dump_all</span><span style="color: black;">&#40;</span>mydata, explicit_start=<span style="color: #008000;">True</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>And our output is fairly obvious:</p>
<pre>
---
employed: true
family: [wife, toddler]
hobby: python
limbs: {arms: 2, legs: 2}
person: jesse
---
employed: true
family: [wife, toddler]
hobby: python
limbs: {arms: 2, legs: 2}
person: jesse
</pre>
<p>By default, you don’t need to pass additional arguments to ”yaml.dump()” or ”yaml.dump_all()”, as you can see above. In the ”dump_all()” example, I added the ”explicit_start” argument. The dump functions support this flag, along with some others that you should know about, to control formatting.</p>
<p>The ”explicit_start” argument adds the “—” string prior to the data structure being dumped. This allows you to dump multiple objects/documents to the same stream, say, an open file handle, without worrying about the document separators yourself.</p>
<p>Adding the ”default_flow_style” argument changes the output from the default compact style of output, to the more verbose, “humane” output:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code12'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p58112"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p581code12"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">print</span> yaml.<span style="color: black;">dump</span><span style="color: black;">&#40;</span>mydata, default_flow_style=<span style="color: #008000;">False</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>And the output:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code13'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p58113"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code" id="p581code13"><pre class="python" style="font-family:monospace;">employed: true
family:
- wife
- toddler
hobby: python
limbs:
  arms: <span style="color: #ff4500;">2</span>
  legs: <span style="color: #ff4500;">2</span>
person: jesse</pre></td></tr></table></div>

<p>You can also control indenting, width, and so on. You can also switch it to canonical mode, which explicitly defines the type of the value within the YAML:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code14'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p58114"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p581code14"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">print</span> yaml.<span style="color: black;">dump</span><span style="color: black;">&#40;</span>mydata, canonical=<span style="color: #008000;">True</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>And the matching output:</p>
<pre>
!!map {
  ? !!str "employed"
  : !!bool "true",
  ? !!str "family"
  : !!seq [
    !!str "wife",
    !!str "toddler",
  ],
  ? !!str "hobby"
  : !!str "python",
  ? !!str "limbs"
  : !!map {
    ? !!str "arms"
    : !!int "2",
    ? !!str "legs"
    : !!int "2",
  },
  ? !!str "person"
  : !!str "jesse",
}
</pre>
<p>Yes, I just jumped the tracks on that last one. YAML and PyYAML both support explicit type declaration within the YAML documents. This is obviously handy for inter-language data exchange, but, as you can see in the output, is not so good on the side of readability if you’re a non-programmer. On the other hand, it allows for a nice segue!</p>
<p>=h=Turning the Awesome Up=h=</p>
<p>We have covered the basics of YAML and, by extension, PyYAML, but PyYAML offers some additional niceties for Python users. Obviously, these advanced features start to edge out approachability, but they are actually really useful.</p>
<p>In the last example of the last section, we turned on the ”canonical” flag to the ”dump” function, which caused it to spit out explicitly typed YAML. Each type was in the format of
<pre>''!!<type>''</pre>
<p>. These are standard YAML tags, and they’re fully covered in the spec.</p>
<p>Internally, PyYAML converts these tags to the expected Python types. ”!!null” is ”None”, ”!!timestamp” is ”datetime.datetime”, ”!!seq” is ”list”, and so on. You don’t need to explicitly put these in your YAML documents. In most cases the types are inferred from the document, but being able to explicitly define them is handy.</p>
<p>PyYAML can take the ”!!” syntax a bit further though, and adds a series of Python-specific tags which are exceedingly useful. Each one of the Python-specific tags is prefaced with
<pre>''!!python/<tag>''</pre>
<p>. PyYAML defines explicit Python types such as ”float”, ”complex”, ”list”, ”tuple” and ”dict”. In my opinion, the ”tuple” and the ”integer” ones are more useful simply due to the fact that ”dicts” and ”lists” can be derived from the YAML file itself.</p>
<p>However, PyYAML also offers “non-type” ”!!python” extensions. These are referred to as “Complex Python Tags” and they allow you to add things to your YAML document such as Python modules, packages, class instances, and the output of a method call with a passed-in variable.</p>
<p>Say we wanted to have a YAML file which defined some number of variables, but then passed one or more of them to a given module’s method. I wanted something to list the contents of my home directory on parsing:</p>
<pre>
 # YAML
directory: &#038;DIRECTORY /Users/jesse
contents: !!python/object/apply:os.listdir [*DIRECTORY]
</pre>
<p>And the abbreviated output:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code15'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p58115"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p581code15"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #483d8b;">'contents'</span>: <span style="color: black;">&#91;</span><span style="color: #483d8b;">'.bash_history'</span>,
              <span style="color: #483d8b;">'.bash_profile'</span>,
              <span style="color: #483d8b;">'todo.txt'</span><span style="color: black;">&#93;</span>,
 <span style="color: #483d8b;">'directory'</span>: <span style="color: #483d8b;">'/Users/jesse'</span><span style="color: black;">&#125;</span></pre></td></tr></table></div>

<p>Virtually any function can be called this way. You can also pass in keyword arguments and other data as required. Calling a function, though, is rather easy. Here’s an example YAML file which uses the PyYAML ”new:module.class” tag to create a ”Queue.Queue” at load-time with a defined max size:</p>
<pre>
qsize: &#038;SIZE 10
queue: !!python/object/new:Queue.Queue {maxsize: *SIZE}
</pre>
<p>Which, of course, passes you back the correct class instance:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code16'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p58116"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p581code16"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #483d8b;">'qsize'</span>: <span style="color: #ff4500;">10</span>, <span style="color: #483d8b;">'queue'</span>: <span style="color: #66cc66;">&lt;</span>Queue.<span style="color: #dc143c;">Queue</span> instance at 0x292fa8<span style="color: #66cc66;">&gt;</span><span style="color: black;">&#125;</span></pre></td></tr></table></div>

<p>In theory, and in my rather abusive practice, this would allow you to define a very rich configuration which constructed all of the relevant objects at parse-time to significantly alter the behavior of the application (or in my case, test) to which the YAML file was passed. One catch when you are using the ”!!python/object/*” tag(s) is that the objects you are creating must be pickle-compatible.</p>
<p>For example, if you tried this:</p>
<pre>
 # YAML
threadpool:
 - !!python/object/new:threading.Thread
  target: myapp.myfunction
</pre>
<p>It would fail with an assertion error:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p581code17'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p58117"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p581code17"><pre class="python" style="font-family:monospace;"><span style="color: #008000;">AssertionError</span>: Thread.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> was <span style="color: #ff7700;font-weight:bold;">not</span> called</pre></td></tr></table></div>

<p>PyYAML is not calling ”__init__()” when creating the object. Both ”yaml.load()” and ”yaml.dump()” are designed to work exactly like ”pickle.load()” and ”pickle.dump()”. Objects must implement the pickle protocol.</p>
<h2>Conclusion</h2>
<p>YAML and, by extension, PyYAML, are incredibly useful if you want something easy on the eyes, easy to understand, and easy to use in a markup language. It’s straightforward to customize, it’s cross-language,  and fundamentally simple.  YAML is popping up in all sorts of places, such as the configuration settings for Google’s AppEngine, and in Django, where it is used for a serialization format and to load data fixtures.</p>
<p>Obviously some of the advanced features of PyYAML are Python-specific, but the fundamentals make it an easy win for cross-language communication. Sure, XML does this, too, and there’s support in every known language for XML parsing (including the stuff toddlers speak), but how readable is XML, seriously?</p>
<p>I do hope more and more people adopt this user-friendly format. It’s simply great as a configuration language, and if you need to expose anything to humans and later serialize and deserialize it, just say “no” to XML.</p>
<p>The revolution will be readable.</p>
<p>Requirements:</p>
<ul>
<li>Python 2.5 or higher
<li>PyYAML — <a href="http://pypi.python.org/pypi/PyYAML/" target="_blank">http://pypi.python.org/pypi/PyYAML/</a>
</ul>
<p>Related Links</p>
<ul>
<li>YAML — <a href="http://www.yaml.org" target="_blank">http://www.yaml.org</a>
<li>Compact YAML Cheat-Sheet — <a href="http://yaml.org/refcard.html" target="_blank">http://yaml.org/refcard.html</a>
<li>PyYAML Wiki — <a href="http://pyyaml.org/wiki/PyYAML" target="_blank">http://pyyaml.org/wiki/PyYAML</a>
<li>PyYAML Documentation — <a href="http://pyyaml.org/wiki/PyYAMLDocumentation" target="_blank">http://pyyaml.org/wiki/PyYAMLDocumentation</a>
<li>Pickle Protocol — <a href="http://www.python.org/dev/peps/pep-0307/" target="_blank">http://www.python.org/dev/peps/pep-0307/</a>
</ul>
<p class="wp-flattr-button"></p>]]></content:encoded>
			<wfw:commentRss>http://jessenoller.com/2009/04/13/yaml-aint-markup-language-completely-different/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>SSH Programming with Paramiko &#124; Completely Different</title>
		<link>http://jessenoller.com/2009/02/05/ssh-programming-with-paramiko-completely-different/</link>
		<comments>http://jessenoller.com/2009/02/05/ssh-programming-with-paramiko-completely-different/#comments</comments>
		<pubDate>Thu, 05 Feb 2009 15:18:36 +0000</pubDate>
		<dc:creator>jesse</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[python magazine]]></category>

		<guid isPermaLink="false">http://jessenoller.com/?p=465</guid>
		<description><![CDATA[OpenSSH is the ubiquitous method of remote access for secure remote-machine login and file transfers. Many people -- systems administrators, test automation engineers, web developers and others have to use and interact with it daily. Scripting SSH access and file transfers with Python can be frustrating -- but the Paramiko module solves that in ...]]></description>
			<content:encoded><![CDATA[<blockquote><p>OpenSSH is the ubiquitous method of remote access for secure remote-machine login and file transfers. Many people — systems administrators, test automation engineers, web developers and others have to use and interact with it daily. Scripting SSH access and file transfers with Python can be frustrating — but the Paramiko module solves that in a powerful way.</p></blockquote>
<p><em>This is a reprint of an article I wrote for <a href="http://www.pythonmagazine.com/" target="_blank">Python Magazine</a> as a Completely Different column that was published in the October 2008 issue. <b>I have republished this in its original form, bugs and all</b></em></p>
<p><span id="more-465"></span></p>
<p>SSH is everywhere.  OS X, Linux, Solaris, and even Windows offer OpenSSH servers for remote access and file transfers. It long ago displaced other methods of remote access like telnet and rlogin. While those other systems may still exist, their widespread usage has faded with the rapid adoption of the OpenSSH suite of tools.</p>
<p>OpenSSH itself is actually a suite of tools based on the ssh2 protocol. The suite provides secure remote login tools (ssh), secure file transfer (scp and sftp), and key management tools.</p>
<p>On most operating systems the client-side tools (ssh, scp, sftp) are already installed for users to leverage. Users can also easily install and configure the server-side utilities on systems they want to remotely access.</p>
<p>Many, many people use OpenSSH daily, and many of them spend a lot of time trying to script its usage. Most of these tools and scripts try to wrap the command line executables (ssh, scp, etc) directly. They use things like Pexpect to provide passwords, and try to rationalize and parse the output of the binaries directly.</p>
<p>Having spent a lot of time scripting around the binaries and trying to manage timeouts, standard out/in/error pipes, authentication, arguments and options all through ”subprocess”, ”popen2”, etc., I’m here to tell you wrapping command line binaries is prone to error, difficult to test, and painful to maintain.</p>
<p>When you’re in the business of parsing output from command line utilities, watching for exit codes and juggling timeouts, you’re not on a good path. That’s where something like Paramiko comes in.</p>
<p>I discovered Paramiko some time ago. It builds on PyCrypto to provide a Python interface to the SSH2 protocol. The module provides all of the faculties you could ask for, including: ssh-key authentication, ssh shell access, and sftp. </p>
<p>Since discovering Paramiko, my entire paradigm and usage of SSH has changed.  Instead of the frustrating experience of shelling-out and hacking around the various kinks with that, I can programmatically access all of the protocols and tools I need in a clean, Pythonic way.</p>
<h3>About Paramiko</h3>
<p>Paramiko is a pure-Python module and can be easy_install’ed as other typical python modules can.  However, PyCrypto is written largely in C, so you may need a compiler to install both depending on your platform.</p>
<p>Paramiko itself has extensive API documentation and an active mailing list. As an added bonus, there’s a Java port of it as well (don’t get me started on controlling SSH within Java) if you need something to achieve the same thing in Java.</p>
<p>Paramiko also offers an implementation of the SSH and SFTP server protocols. It really is feature-rich and complete. I’ve used it in heavily threaded applications as well as in day-to-day maintenance scripts. There’s even an installation and deployment system, named Fabric, that further builds on Paramiko to provide application deployment utilities via SSH. </p>
<h3>Getting started</h3>
<p>The primary class of the Paramiko API is ”paramiko.SSHClient”. It provides the basic interface you are going to want to use to instantiate server connections and file transfers.</p>
<p>Here’s a simple example:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code18'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46518"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p465code18"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> paramiko
ssh = paramiko.<span style="color: black;">SSHClient</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
ssh.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'127.0.0.1'</span>, username=<span style="color: #483d8b;">'jesse'</span>, 
    password=<span style="color: #483d8b;">'lol'</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>This creates a new SSHClient object, and then calls ”connect()” to connect us to the local SSH server. It can’t get much easier than that!</p>
<h3>Host Keys</h3>
<p>One of the complicating aspects of SSH authentication is <em>host keys</em>.  Whenever you make an ssh connection to a remote machine, that host’s key is stored automatically in a file in your home directory called ”.ssh/known_hosts”. If you’ve ever connected to a new host via SSH and seen a message like this:</p>
<pre>
The authenticity of host 'localhost (::1)' can't be
established.
RSA key fingerprint is
22:fb:16:3c:24:7f:60:99:4f:f4:57:d6:d1:09:9e:28.
Are you sure you want to continue connecting
(yes/no)?
</pre>
<p>and typed “yes” — you’ve added an entry to the ”known_hosts” file. These keys are important because accepting them implies a level of trust of the host. If the key ever changes or is compromised in some way, your client will refuse to connect without notifying you.</p>
<p>Paramiko enforces this same rule. You must accept and authorize the use and storage of these keys on a per-host basis. Luckily, rather then having to be prompted for each one, or manage each one individually, you can set a magic policy.</p>
<p>The default behavior with an SSHClient object is to refuse to connect to a host (”paramiko.RejectPolicy”) who does not have a key stored in your local ”known_hosts” file. This can become annoying when working in a lab environment where machines come and go and have the operating system reinstalled constantly.</p>
<p>Setting the host key policy takes one method call to the ssh client object (”set_missing_host_key_policy()”), which sets the way you want to manage inbound host keys. If you’re lazy like me, you pass in the ”paramiko.AutoAddPolicy()” which will auto-accept unknown keys.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code19'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46519"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p465code19"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> paramiko
ssh = paramiko.<span style="color: black;">SSHClient</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
ssh.<span style="color: black;">set_missing_host_key_policy</span><span style="color: black;">&#40;</span>
    paramiko.<span style="color: black;">AutoAddPolicy</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
ssh.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'127.0.0.1'</span>, username=<span style="color: #483d8b;">'jesse'</span>, 
    password=<span style="color: #483d8b;">'lol'</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Of course, don’t do this if you’re working with machines you don’t know or trust!  Tools built on Paramiko should make this overly liberal policy a configuration option.</p>
<h3>Running Simple Commands</h3>
<p>So, now that we’re connected, we should try running a command and getting some output.</p>
<p>SSH uses the same type of input, output, and error handles you should be familiar with from other Unix-like applications. Errors are sent to standard error, output goes to standard out, and if you want to send data back to the application, you write it to standard in.</p>
<p>So, the response data from client commands are going to come back in a tuple — (stdin, stdout, stderr) — which are file-like objects you can read from (or write to, in the case of stdin). For example:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code20'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46520"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code" id="p465code20"><pre class="python" style="font-family:monospace;">...
<span style="color: #66cc66;">&gt;&gt;&gt;</span> ssh.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'127.0.0.1'</span>, username=<span style="color: #483d8b;">'jesse'</span>, 
...    <span style="color: black;">password</span>=<span style="color: #483d8b;">'lol'</span><span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> stdin, stdout, stderr = \
...    <span style="color: black;">ssh</span>.<span style="color: black;">exec_command</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;uptime&quot;</span><span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #008000;">type</span><span style="color: black;">&#40;</span>stdin<span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&lt;</span>class <span style="color: #483d8b;">'paramiko.ChannelFile'</span><span style="color: #66cc66;">&gt;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> stdout.<span style="color: black;">readlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #483d8b;">'13:35  up 11 days,  3:13, 4 users, load averages: 0.14 0.18 0.16<span style="color: #000099; font-weight: bold;">\n</span>'</span><span style="color: black;">&#93;</span></pre></td></tr></table></div>

<p>Under the covers, Paramiko has opened a new ”paramiko.Channel” object which represents the secure tunnel to the remote host. The Channel object acts like a normal python socket object. When we call ”exec_command()”, the Channel to the host is opened, and we are handed back ”paramiko.ChannelFile” “file-like” objects which represents the data sent to and from the remote host.</p>
<p>One of the documented nits with the ChannelFile objects paramiko passes back to you is that you need to constantly ”read()” off of the stderr and stdout handles given back to you. If the remote host sends back enough data to fill the buffer, the host will hang waiting for your program to read more. A way around this is to either call ”readlines()” as we did above, or ”read()”. If you need to internally buffer the data, you can also iterate over the object with ”readline()”.</p>
<p>This is the simplest form of connecting and running a command to get the output back. For many sysadmin tasks, this will be invaluable as you need to parse the output of a returned command to find exactly what you need.  With Python’s rich string manipulation, this is an easy task. Let’s run something with a lot of output, that also requires a password:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code21'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46521"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p465code21"><pre class="python" style="font-family:monospace;">ssh.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'127.0.0.1'</span>, username=<span style="color: #483d8b;">'jesse'</span>, 
   password=<span style="color: #483d8b;">'lol'</span><span style="color: black;">&#41;</span>
stdin, stdout, stderr = ssh.<span style="color: black;">exec_command</span><span style="color: black;">&#40;</span>
   <span style="color: #483d8b;">&quot;sudo dmesg&quot;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Uh oh. I just called the sudo command. It is going to require me to provide a password interactively with the remote host. No worries:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code22'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46522"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code" id="p465code22"><pre class="python" style="font-family:monospace;">ssh.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'127.0.0.1'</span>, username=<span style="color: #483d8b;">'jesse'</span>, 
    password=<span style="color: #483d8b;">'lol'</span><span style="color: black;">&#41;</span>
stdin, stdout, stderr = ssh.<span style="color: black;">exec_command</span><span style="color: black;">&#40;</span>
    <span style="color: #483d8b;">&quot;sudo dmesg&quot;</span><span style="color: black;">&#41;</span>
stdin.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'lol<span style="color: #000099; font-weight: bold;">\n</span>'</span><span style="color: black;">&#41;</span>
stdin.<span style="color: black;">flush</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
data = stdout.<span style="color: black;">read</span>.<span style="color: black;">splitlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> line <span style="color: #ff7700;font-weight:bold;">in</span> data:
    <span style="color: #ff7700;font-weight:bold;">if</span> line.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">':'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> == <span style="color: #483d8b;">'AirPort'</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> line</pre></td></tr></table></div>

<p>There! I logged in remotely and found all messages for my Airport card. The key thing to note here is that I wrote my password to the stdin “file” so that sudo allowed me in. </p>
<p>If you’re wondering, yes, this provides an easy base to create your own interactive shell. You might want to do something like this to make a little custom admin shell using the Python cmd module to administer machines inside of your lab.</p>
<p>Using Paramiko, this is easy. In Listing 1, I outline a basic way to approach this — we wrap the Paramiko manipulation up in the RunCommand methods, allowing the user to add as many hosts as they want, call connect and then run a command.</p>
<p>Listing 1:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code23'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46523"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
</pre></td><td class="code" id="p465code23"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/python</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> paramiko
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">cmd</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> RunCommand<span style="color: black;">&#40;</span><span style="color: #dc143c;">cmd</span>.<span style="color: black;">Cmd</span><span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot; Simple shell to run a command on the host &quot;&quot;&quot;</span>
&nbsp;
    prompt = <span style="color: #483d8b;">'ssh &gt; '</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #dc143c;">cmd</span>.<span style="color: black;">Cmd</span>.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">hosts</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">connections</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> do_add_host<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, args<span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;&quot;add_host &lt;host,user,password&gt;
        Add the host to the host list&quot;&quot;&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> args:
            <span style="color: #008000;">self</span>.<span style="color: black;">hosts</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>args.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">','</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">else</span>:
            <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;usage: host &lt;hostip,user,password&gt;&quot;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> do_connect<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, args<span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;&quot;Connect to all hosts in the hosts list&quot;&quot;&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> host <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">hosts</span>:
            client = paramiko.<span style="color: black;">SSHClient</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            client.<span style="color: black;">set_missing_host_key_policy</span><span style="color: black;">&#40;</span>
                paramiko.<span style="color: black;">AutoAddPolicy</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
            client.<span style="color: black;">connect</span><span style="color: black;">&#40;</span>host<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, 
                username=host<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>, 
                password=host<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span>.<span style="color: black;">connections</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>client<span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> do_run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, command<span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;&quot;run &lt;command&gt;
        Execute this command on all hosts in the list&quot;&quot;&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> command:
            <span style="color: #ff7700;font-weight:bold;">for</span> host, conn <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">zip</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">hosts</span>, <span style="color: #008000;">self</span>.<span style="color: black;">connections</span><span style="color: black;">&#41;</span>:
                stdin, stdout, stderr = conn.<span style="color: black;">exec_command</span><span style="color: black;">&#40;</span>command<span style="color: black;">&#41;</span>
                stdin.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">for</span> line <span style="color: #ff7700;font-weight:bold;">in</span> stdout.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">splitlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
                    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'host: %s: %s'</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>host<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, line<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">else</span>:
            <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;usage: run &lt;command&gt;&quot;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> do_close<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, args<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> conn <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">connections</span>:
            conn.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    RunCommand<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">cmdloop</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Example output:</p>
<pre>
ssh > add_host 127.0.0.1,jesse,lol
ssh > connect
ssh > run uptime
host: 127.0.0.1: 14:49  up 11 days,  4:27, 8 users,
load averages: 0.36 0.25 0.19
ssh > close
</pre>
<p>This is just designed to be a proof-of concept of a pseudo-interactive shell. There are a few improvements you could make should you use it:</p>
<p>- Better printing for multi-line stdout output.<br />
– Handle standard error<br />
– Add in a quit method<br />
– Thread the command execution/data returned.</p>
<p>Like all shells, the sky is the limit when it comes to data visualization. Tools like pssh, OSH, Fabric, etc., all manage the return data differently, and they all have different ways of aggregating the output from different hosts.</p>
<h3>File put and get</h3>
<p>File manipulation within Paramiko is handled via the SFTP implementation, and, like the ssh client command execution, it’s easy as pie.</p>
<p>We start by instantiating a new paramiko.SSHClient just as before:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code24'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46524"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p465code24"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> paramiko
ssh = paramiko.<span style="color: black;">SSHClient</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
ssh.<span style="color: black;">set_missing_host_key_policy</span><span style="color: black;">&#40;</span>
    paramiko.<span style="color: black;">AutoAddPolicy</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
ssh.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'127.0.0.1'</span>, username=<span style="color: #483d8b;">'jesse'</span>, 
    password=<span style="color: #483d8b;">'lol'</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>This time, we make a call into ”open_sftp()” after we perform the connect to the host.  ”open_sftp()” returns a ”paramiko.SFTPClient” client object that supports all of the normal sftp operations (stat, put, get, etc.). In this example, we perform a “get” operation to download the file ”remotefile.py” from the remote system and write it to to the local file, ”localfile.py”.</p>
<p><code><br />
ftp = ssh.open_sftp()<br />
ftp.get('remotefile.py', 'localfile.py')<br />
ftp.close()<br />
</code></p>
<p>Writing a file to the remote host (a “put” operation) works the exact same way.  We just transpose the local and remote arguments:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code25'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46525"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p465code25"><pre class="python" style="font-family:monospace;">ftp = ssh.<span style="color: black;">open_sftp</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
ftp.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'localfile.py'</span>, <span style="color: #483d8b;">'remotefile.py'</span><span style="color: black;">&#41;</span>
ftp.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>The nice thing about the sftp client implementation that Paramiko provides is that it support things like stat, chmod, chown, etc. Obviously these might act differently depending on the remote server because some servers do not implement all of the protocol, but even so they’re incredibly useful.</p>
<p>You could easily write functions like ”glob.glob()” to transverse a remote directory tree looking for a particular filename pattern.  You could also search based on permissions, size, etc.</p>
<p>One thing to note, however, and this bit me a few times: sftp as a protocol is slightly more restrictive than something like normal secure copy (scp). SCP allows you to use Unix wild cards in the file name when grabbing a file from the remote machine. SFTP, on the other hand, expects the full explicit path to the file you want to download. An example of this is:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code26'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46526"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p465code26"><pre class="python" style="font-family:monospace;">ftp.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'*.py'</span>, <span style="color: #483d8b;">'.'</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>In most cases, this would mean “download all files with .py” to the local directory on my machine. SFTP is unhappy with this formulation, though (see Listing 2). I learned this the hard way, after I spent several hours pulling apart the sftp client implementation out of frustration.</p>
<p>Listing 2:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p465code27'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46527"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code" id="p465code27"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> ftp.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;./*.py&quot;</span>, <span style="color: #483d8b;">'.'</span><span style="color: black;">&#41;</span>
Traceback <span style="color: black;">&#40;</span>most recent call last<span style="color: black;">&#41;</span>:
  File <span style="color: #483d8b;">&quot;&lt;stdin&gt;&quot;</span>, line <span style="color: #ff4500;">1</span>, <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #66cc66;">&lt;</span>module<span style="color: #66cc66;">&gt;</span>
  File <span style="color: #483d8b;">&quot;/Library/Python/2.5/site-packages/paramiko/sftp_client.py&quot;</span>, 
    line <span style="color: #ff4500;">567</span>, <span style="color: #ff7700;font-weight:bold;">in</span> get
    fr = <span style="color: #008000;">self</span>.<span style="color: #008000;">file</span><span style="color: black;">&#40;</span>remotepath, <span style="color: #483d8b;">'rb'</span><span style="color: black;">&#41;</span>
  File <span style="color: #483d8b;">&quot;/Library/Python/2.5/site-packages/paramiko/sftp_client.py&quot;</span>, 
    line <span style="color: #ff4500;">238</span>, <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">open</span>
    t, msg = <span style="color: #008000;">self</span>._request<span style="color: black;">&#40;</span>CMD_OPEN, filename, imode, attrblock<span style="color: black;">&#41;</span>
  File <span style="color: #483d8b;">&quot;/Library/Python/2.5/site-packages/paramiko/sftp_client.py&quot;</span>, 
    line <span style="color: #ff4500;">589</span>, <span style="color: #ff7700;font-weight:bold;">in</span> _request
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>._read_response<span style="color: black;">&#40;</span>num<span style="color: black;">&#41;</span>
  File <span style="color: #483d8b;">&quot;/Library/Python/2.5/site-packages/paramiko/sftp_client.py&quot;</span>, 
    line <span style="color: #ff4500;">636</span>, <span style="color: #ff7700;font-weight:bold;">in</span> _read_response
    <span style="color: #008000;">self</span>._convert_status<span style="color: black;">&#40;</span>msg<span style="color: black;">&#41;</span>
  File <span style="color: #483d8b;">&quot;/Library/Python/2.5/site-packages/paramiko/sftp_client.py&quot;</span>, 
    line <span style="color: #ff4500;">662</span>, <span style="color: #ff7700;font-weight:bold;">in</span> _convert_status
    <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">IOError</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">errno</span>.<span style="color: black;">ENOENT</span>, text<span style="color: black;">&#41;</span>
<span style="color: #008000;">IOError</span>: <span style="color: black;">&#91;</span>Errno <span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span> No such <span style="color: #008000;">file</span></pre></td></tr></table></div>

<h3>In Closing</h3>
<p>I hope I’ve shown you enough to really dig into Paramiko.  It’s one of the gems from the Python community that helps me on a daily basis. I can do remote administration programmatically, write test plugins that perform remote operations easily, and a lot more, all without needing to install extra daemons on the remote machines.</p>
<p>SSH is everywhere, and sooner or later you’re going to need to write a program that interacts with it. Why not save yourself the trouble now and give Paramiko a look?</p>
<h3>Related Links</h3>
<ul>
    OpenSSH — <a href="http://www.openssh.com/" target="_blank">http://www.openssh.com/</a><br />
    Pexpect — <a href="http://www.noah.org/wiki/Pexpect" target="_blank">http://www.noah.org/wiki/Pexpect</a><br />
    Fabric — <a href="http://www.nongnu.org/fab/" target="_blank">http://www.nongnu.org/fab/</a><br />
    Paramiko Docs — <a href="http://www.lag.net/paramiko/docs/" target="_blank">http://www.lag.net/paramiko/docs/</a><br />
    Paramiko Mailing List — <a href="http://www.lag.net/mailman/listinfo/paramiko" target="_blank">http://www.lag.net/mailman/listinfo/paramiko</a><br />
    OSH — <a href="http://geophile.com/osh/" target="_blank">http://geophile.com/osh/</a><br />
    PSSH — <a href="http://www.theether.org/pssh/" target="_blank">http://www.theether.org/pssh/</a>
</ul>
<p class="wp-flattr-button"></p>]]></content:encoded>
			<wfw:commentRss>http://jessenoller.com/2009/02/05/ssh-programming-with-paramiko-completely-different/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>A (brief) introduction to Python-Core development &#124; Completely Different</title>
		<link>http://jessenoller.com/2009/02/04/a-brief-introduction-to-python-core-development-completely-different/</link>
		<comments>http://jessenoller.com/2009/02/04/a-brief-introduction-to-python-core-development-completely-different/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 14:40:41 +0000</pubDate>
		<dc:creator>jesse</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[python magazine]]></category>

		<guid isPermaLink="false">http://jessenoller.com/?p=463</guid>
		<description><![CDATA[This is a reprint of an article I wrote for Python Magazine as a Completely Different column that was published in the August 2008 issue. 


In the early summer of this year I had the chance to really get started working on/with the core Python source. I had spent some time putting together a ...]]></description>
			<content:encoded><![CDATA[<p><em>This is a reprint of an article I wrote for <a href="http://www.pythonmagazine.com/" target="_blank">Python Magazine</a> as a Completely Different column that was published in the August 2008 issue. </em></p>
<blockquote><p>
In the early summer of this year I had the chance to really get started working on/with the core Python source. I had spent some time putting together a Python Enhancement Proposal (PEP) which was accepted. Now, I just needed to learn the code base, practices and buy a helmet. Shortly after getting the initial patch accepted, I ended up breaking the build, tests and caused the beta to slip. This article is an introduction to Core development, in which we’ll cover what you need to get started, and where I personally screwed up.</p></blockquote>
<p><span id="more-463"></span></p>
<h3>Introduction</h3>
<p>Core Python development (or, “hacking on python-core” as it may be called) is, like all great open-source projects, a highly distributed, highly active, and high participation project. There are developers all over the world filing bugs, submitting patches for code and documentation, as well as participating on the python-dev mailing list and IRC channel.</p>
<p>Like all other good open source communities, it’s a meritocracy of the technical persuasion. A good idea is simply that: a good idea.  If a good idea is the best of breed, it will be adopted or adapted to the language and project. If an idea or a patch is clear, concise, and solves a problem, there is generally no difficulty in getting traction or getting a patch put into core code base.</p>
<h3>Let’s start from the beginning</h3>
<p>While Python is a meritocracy where any person can submit a patch, file a bug, or send emails to python-dev (sometimes, that last is more of a curse than a blessing), there is a particular group of people that has commit privileges. This group is responsible for judging all patches, proposed bugs and associated fixes, and ultimately committing the actual code to the tree.</p>
<p>Python’s code, documentation, PEPs, and other artifacts are all hosted within a Subversion (svn) repository. While the core is in svn, you can also access it via other popular version control tools.  There are Bazaar, Git, and Mercurial mirrors of the svn repository.  All of the examples in this article will revolve around subversion, though, because the other trees are still experimental.</p>
<p>In order to view the repository, you need to check out a read-only version of the source tree. Write access is only available via svn+ssh authenticated access, but you can use HTTP for a read-only copy.  So, to check it out:</p>
<pre>
mkdir -p python/trunk
svn co http://svn.python.org/projects/python/trunk python/trunk
</pre>
<p>This is your own, pristine copy: any edits you make in this tree will come up on a ”svn diff” (which you’ll use to make patches).  Avoid editing files you don’t need to so you don’t accidentally taint a diff or checkin.</p>
<p>The basic layout of the tree is unsurprisingly simple, so I’ll only really cover the important files/directories:</p>
<p>”Doc/” contains all of the documentation for the language, which will be discussed in more detail later. If you want to see the standard library documentation, look in Doc/library.</p>
<p>You will find the brain-melting grammar definition for the Python language in ”Grammar/”.</p>
<p>Header files for C code go in ”Include/”.</p>
<p>Libraries written in Python are in ”Lib/”.  You’ll note a distinct lack of C code in this directory.  That’s because C modules go in the ”Modules” directory. Also found in ”Lib/” is the ”test/” directory, which we’ll be focusing on later. If you want to see some pretty Python code, read the files in this directory. Except anything I’ve done.</p>
<p>C extensions, such as multiprocessing, ctypes, cStringIO, et cetera can be found in ”Modules/”.  Generally speaking, these are optimized modules for the standard library. Some of them are in subdirectories for cleanliness, but most of them are in the top level Modules/ directory. Note that there is a style guide for C code for the standard library, outlined in PEP 7.)</p>
<p>The ”Misc/” directory contains things that don’t belong elsewhere within the tree. This includes the NEWS file, build notes, configuration for valgrind (a code profiling/debugging utility), a cheat sheet (somewhat dated, but still useful), and some editor plugins. A really good file here is  SpecialBuilds.txt, which goes over all the magic flags for Python builds you should know about.</p>
<p>Python objects are defined in ”Objects/”. It contains all C code, and is pretty well documented. If you suddenly get the urge to make a new type, start here.</p>
<p>Miscellaneous tools go in ”Tools/”. I haven’t had to use much of anything down here except for the scripts in the ”scripts/” subdirectory. The ”script” directory is just filled with cool things like untabify.py, crlf.py, and google.py</p>
<p>There are two build files.  The main build file, sort of, is ”setup.py”. I list it here because you <em>need</em> to look at this file to realize how things are built. The make steps we cover later are wrappers around this script for the most part. The the “other” build file is ”Makefile.pre.in”. It works with ”setup.py” to control the entire compilation process and has some nifty targets, like “make tags”.  Who knew the build process could spit out a tags file for ”vi”?</p>
<p>It is important that you pay attention to both ”setup.py”  and ”Makefile.pre.in”.  When I forgot <em>one line</em> in the Makefile, my extension module seemed to work, but didn’t really.  I could “import multiprocessing” from within the svn tree using the local python interpreter.  However, after running “make install” the extension module was not installed, so it did not work with the installed interpreter. I finally discovered this was due to a single missing entry in LIBSUBDIRS.</p>
<p>Whew. That’s a lot of directories. I skipped over the Windows build stuff, and I am going to continue to do so, noting that I am not a Windows expert. I do know that if you are on Windows you will need to look in the ”PCBuild/” directory for build information, Visual Studio projects, etc.</p>
<h3>Building</h3>
<p>Before we go any further, let’s walk through the basic build process. Remember, I’m a Linux and OS X guy, so I will be walking you through the steps you would take on a Unix machine.  Windows users will need to either use Visual Studio, or install Cygwin (a Unix tool chain for Windows). Installing the Cygwin tool chain means you should be able to compile just fine following these directions.</p>
<p>First off, the ./configure step. If you’re familiar with autoconf, automake, and the like, you’re more than familiar with this. For those that aren’t, the configure, make, etc. steps are common to configuring and compiling/installing a given application. See the link to Autoconf in the requirements section for more details. There are some custom options for configure (of course), which you can see with ”./configure –help”.  The main one you want to know about and use is ”–with-pydebug”, which enables a special debug build of Python. You are going to want to have the debug build if you start heavily working on the core of the interpreter. The ”–with-pydebug” flag enables, in no particular order, LLTRACE, Py_REF_DEBUG, Py_TRACE_REFS, PYMALLOC_DEBUG, C code assertions, and all code that has ”#ifdef Py_DEBUG” blocks.  In other words, it turns on just about every debugging feature you could possibly need or want, short of something that fixes your code for you automatically.</p>
<p>For the exact details on all of the configure flags, including platform specific options, see Misc/SpecialBuilds.txt.</p>
<p>To start a build, just fire off a</p>
<pre>
$ ./configure --with-pydebug
</pre>
<p>in ”python/trunk”. Once this is done, unless you really want to twiddle the options, you shouldn’t need to do this again for a while. Brett Cannon once told me, when talking about some development TextMate macros, “I left out configure stuff because that becomes rather personal”.</p>
<p>Next up, execute ”make” in the python/trunk directory. You’ll see your normal make output, but there are a few caveats to keep in mind.</p>
<p>Here is some example output from the ./configure and make steps:</p>
<pre>
$ ./configure
checking for --with-universal-archs... 32-bit
checking MACHDEP... darwin
checking EXTRAPLATDIR... $(PLATMACDIRS)
...snip...
creating Modules/Setup
creating Modules/Setup.local
creating Makefile
woot:python-trunk jesse$ make
... gcc output snipped ...
Failed to find the necessary bits to build
these modules:
_bsddb             gdbm               linuxaudiodev
ossaudiodev        readline           spwd
sunaudiodev
To find the necessary bits, look in setup.py in
detect_modules() for the module's name.

running build_scripts
$
</pre>
<p>Pay attention to the build output. If you’re working on a module with C extensions or the interpreter itself, what can go wrong here will go wrong. For example, while working on integrating the _multiprocessing library to ”Modules/”, the initial issues around simple compilation were exposed here.</p>
<p>As you can see, there is an important report at the end of the make step (the log line looks like: “Failed to find the necessary bits to build these modules:”).  The information given in that report is especially important if you need access to the skipped modules. For example, on OS X the ”readline” module doesn’t compile out of the box.  You will need to resolve the dependencies listed in ”trunk/setup.py” in order to get it up and running.</p>
<p>If you want to “quiet down” the make step, adding the “-s” flag will make it less verbose. Also, if you want to speed it up, consider using the “-j NUM” to increase the number of concurrent commands being performed.</p>
<p>Once the build completes successfully, you should have a working Python binary in your local directory. On OS X and Windows it’s named ”python.exe” and on Unixes it’s named simply ”python”. If you wanted, you could fire this version up and poke around, but for development your next step should be to run the tests.</p>
<h3>Running Tests</h3>
<p>Python’s source tree’s tests are primarily executed with the ”Lib/test/regrtest.py” utility (this may change in the future) and ”make test”. If you were to run ”make test” in the ”trunk/” directory right after building, you would run a subset of all of the tests located in ”Lib/test”. Certain tests, such as large file tests and others that take a lot of time or resources are excluded in favor of brevity.</p>
<p>For details on what a ”make test” step does, open Makefile.pre.in and search for “# Test the interpreter” (it should be around line 660). You will find the definitions for what happens during the ”test*” steps as well as the options that invoke ”regrtest.py”. You can change the test options via the ”TESTOPTS=” flag to ”make test”.  For example, to run a single test:</p>
<pre>
$ make test TESTOPTS=test_multiprocessing
</pre>
<p>The real magic happens in regrtest.py, the Python regression test execution script).  You need to run this for <em>any</em> change made to the code, period. A basic run is the same as the basic ”make test” execution. This means that certain tests are excluded, but you can enable those tests (and a lot more) via additional arguments to regrtest.py. There is even an option to enable coverage analysis.</p>
<p>A basic invocation of regrtest.py looks like this:</p>
<pre>
$ ./python.exe Lib/test/regrtest.py
test_grammar
test_opcodes
test_dict
...snip...
test_zlib
327 tests OK.
32 tests skipped:
    test_al test_bsddb test_bsddb3 test_cd test_cl
    ...
    test_winsound test_zipfile64
Those skips are all expected on darwin.
</pre>
<p>Pretty painless, but if something goes wrong, there’s not a lot of information to go on. A better way to run it is with the ”-w” option, which will re-run any failed test with additional verbosity. For example, I added a line that would cause one of the tests to crash in Listing 1.</p>
<p>Listing 1:</p>
<pre>
$ ./python.exe Lib/test/regrtest.py test_multiprocessing
test_multiprocessing
test test_multiprocessing crashed -- <type 'exceptions.NameError'>: name 'mportasdl' is not defined
1 test failed:
    test_multiprocessing
$ ./python.exe Lib/test/regrtest.py -w test_multiprocessing
test_multiprocessing
test test_multiprocessing crashed -- <type 'exceptions.NameError'>: name 'mportasdl' is not defined
1 test failed:
    test_multiprocessing
Re-running failed tests in verbose mode
Re-running test 'test_multiprocessing' in verbose mode
test test_multiprocessing crashed -- <type 'exceptions.NameError'>: name 'mportasdl' is not defined
Traceback (most recent call last):
  File "Lib/test/regrtest.py", line 549, in runtest_inner
    the_package = __import__(abstest, globals(), locals(), [])
  File "/Users/jesse/open_source/subversion/python-trunk/Lib/test/test_multiprocessing.py", line 6, in <module>
    mportasdl;fj
NameError: name 'mportasdl' is not defined
$
</pre>
<p>There’s one more important flag to regrtest.py you need to know about, and that’s ”-uall”. This option will run all of the tests, and obviously, when you’re changing something really low level, you <em>need</em> to run these tests. They take a long time, so I recommend running them before going to bed.</p>
<h3>Documentation</h3>
<p>Yes, even documentation has bugs. All of Python’s documentation resides in the ”Doc/” directory, and it has its own build scripts and system, called Sphinx. The standard library documentation module overviews we all know and love are located in ”Doc/library/”. When you are making a change that will be public in nature (say, adding a method) you need to find and update the associated documentation.</p>
<p>Also, when adding new packages, modules or methods, you should really consider adding an example in the appropriate section of the module’s .rst file (not the ”Doc/examples” directory). It is common for new Python users to have difficulty finding clear examples on standard library module usage, so the more examples the merrier.</p>
<p>If you’re stuck with the documentation, feel free to send an email to docs@python.org and ask for help.  There are a lot of good people signed up for that list and they’re willing to help you if you’re stuck.</p>
<p>The documentation is all in ReST (ReStructured Text) format and there is some Python-specific syntax that can be of use to you. See the “Documenting Python” page for more information.  A nice nugget I found was breaking the bigger examples out of the main ”module.rst” file (<em>the</em> documentation file for a give module, in ReStructure Text format), and include them separately with:</p>
<pre>
.. literalinclude:: ../includes/mp_webserver.py
</pre>
<p>This means you can drop the python code into the ”Doc/includes” directory and it will be popped in place when the documentation is built. </p>
<p>When you want to try building the docs, simply go into ”trunk/Docs” and type ”make html” to convert all of the documentation into the HTML files you know so well from the Python doc site. Don’t worry about installing Sphinx in advance, the build rules do that for you. Once built, the html documents live in ”Doc/build/html”.</p>
<p>At very least, whenever you make a change to core, you should update the ”Misc/NEWS” file to add a brief description of your change, and also add your name to ”Misc/ACKS”.</p>
<h3>Making a change</h3>
<p>Let’s assume for the moment you’re about to provide a patch to fix a bug from the python bug tracker. Most fixes will require the following minimal changes:</p>
<ul>
<li>Updated Python module
<li>Updated documentation (At least an entry in the NEWS file)
<li>Updated Tests (you will update the tests)
</ul>
<p>In a few cases you also will need to update the C code. After you’ve done the initial check out of the branch you’ll be working on, and you’ve confirmed the build and tests pass on your machine, you should be set to make your changes locally, apply any patches you are testing, etc.</p>
<p>When you’re updating or adding new tests you need to drop into the ”Lib/test” directory and find the “best place” for the test. Typically, if you’re making a bug fix, you’re simply going to append the test onto the suite for the module. Larger scale changes, including creating new packages or modules, will need their own ”test_*.py” file in ”Lib/test”.</p>
<p>It’s important when you’re adding tests that your tests are clear, well documented, and most of all <em>smart</em>. They will need to know when not to run (say, a network test should not run when no network is present) and they need to be reliable (i.e.: they should never just hang). The tests and code you submit will be viewed by many people, and compiled and tested on more platforms than most of us have ever used. The smarter you make the test, the better off everyone will be.</p>
<p>An important tool in the test developer’s arsenal is the ”test_support” library included in ”Lib/test/test_support.py”. In it you will find a variety of functions, exceptions, and tools to help you to write core tests. Most of all, <em>look at the other tests!</em></p>
<p>Once your changes work, you should run a ”make check” to perform some housekeeping operations you want to do prior to generating the diff.  These include fixing whitespace, checking the NEWS/ACKS file for updates, and reminding you to <em>run the test suite</em>! See ”Tools/scripts/patchcheck.py” for everything ”make check” does.</p>
<h3>On Code Bombs</h3>
<p>It’s important to avoid making widespread changes in a vacuum. Large scale refactoring or changes to an API used by a lot of the standard library should be reviewed carefully and often. Typically, it’s better to post an initial patch up on the bug tracker and then revise it as other people/contributors make comments than to drop a huge patch on everyone and say “it’s done”.</p>
<p>A recent python-dev post from Guido highlighted this issue, the take-away quote (from both his email, and the blog post he linked to) being: “The story’s main moral: submit your code for review early and often; work in a branch if you need to, but don’t hide your code from review in a local repository until it’s ‘perfect’.”  For more details, see the “Code Bombs” thread listed in Related Links above.</p>
<p>One of the tools at your disposal for publishing patches for review is Rietveld, the review application created by Guido Van Rossum. Typically, if you have a small enough change, putting a patch in the bug tracker is sufficient.</p>
<p>How do you generate a patch, big or small? It’s easy: cd into your ”trunk/” directory and run ”svn diff &gt;mychange.patch”. This will create a patch containing only your changes which can then be uploaded to the bug tracker, emailed to the community, etc.</p>
<p>Applying the patch is also easy.  Just hop into the ”trunk/” directory and run ”patch –p0 <mychange.patch''.</p>
<h3>Conclusion</h3>
<p>A good first step to contributing to core is to consult the bug tracker. There you can find everything from mind-melting interpreter issues to simple one-line fixes (famous last words). There’s even a query to find “Easy” issues (see the sidebar on bugs.python.org).</p>
<p>One great thing about Python development is that anyone can propose an idea.   Should it stand on it’s own merit, it will probably be accepted. So even if you don’t find a bug in an area you’re passionate about, why not find something you <em>are</em> interested in and make a Python Enhancement Proposal for the change?  Publish it to python-dev and put together the patch for the code.  You can do this for existing modules or even new ones.</p>
<p>Ultimately, Python is <em>your</em> language. Without the people constantly contributing to core in the form of bug fixes, documentation and new programming concepts, Python would simply die on the vine. The more help, the better the language becomes, and the wider the appeal and audience.</p>
<h3>Related Links</h3>
<ul>
<li>Python Developer’s Guide — <a href="http://www.python.org/dev/" target="_blank">http://www.python.org/dev/</a>
<li>FAQ for Developers -<a href="http://www.python.org/dev/faq/" target="_blank"> http://www.python.org/dev/faq/</a>
<li>Python Subversion Repository — <a href="http://www.python.org/dev/faq/" target="_blank">http://svn.python.org/</a>
<li>PEP Index — <a href="http://www.python.org/dev/peps/" target="_blank">http://www.python.org/dev/peps/</a>
<li>Documenting Python -<a href="http://docs.python.org/dev/documenting/" target="_blank"> http://docs.python.org/dev/documenting/</a>
<li>Python-Dev List — <a href="http://mail.python.org/mailman/listinfo/python-dev" target="_blank">http://mail.python.org/mailman/listinfo/python-dev</a>
<li>Sphinx — <a href="http://sphinx.pocoo.org/" target="_blank">http://sphinx.pocoo.org/</a>
<li>How Python is Developed — <a href="http://www.python.org/dev/intro/" target="_blank">http://www.python.org/dev/intro/</a>
<li>Rietveld Codereview Application — <a href="http://codereview.appspot.com/" target="_blank">http://codereview.appspot.com/</a>
<li>Python SVN settings — <a href="http://www.python.org/dev/faq/#id24" target="_blank">http://www.python.org/dev/faq/#id24</a>
<li>Python Bug Tracker — <a href="http://bugs.python.org" target="_blank">http://bugs.python.org</a>
<li>“Code Bombs” — <a href="http://mail.python.org/pipermail/python-dev/2008-June/080318.html" target="_blank">http://mail.python.org/pipermail/python-dev/2008-June/080318.html</a>
<li>Brett Cannon’s introduction to core development from PyCon 2008 — <a href="http://www.cs.ubc.ca/%7Edrifty/pycon/sprint_tutorial.pdf" target="_blank">http://www.cs.ubc.ca/%7Edrifty/pycon/sprint_tutorial.pdf</a>
</ul>
<p class="wp-flattr-button"></p>]]></content:encoded>
			<wfw:commentRss>http://jessenoller.com/2009/02/04/a-brief-introduction-to-python-core-development-completely-different/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Get with the program as contextmanager &#124; Completely Different</title>
		<link>http://jessenoller.com/2009/02/03/get-with-the-program-as-contextmanager-completely-different/</link>
		<comments>http://jessenoller.com/2009/02/03/get-with-the-program-as-contextmanager-completely-different/#comments</comments>
		<pubDate>Tue, 03 Feb 2009 14:30:13 +0000</pubDate>
		<dc:creator>jesse</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[python magazine]]></category>

		<guid isPermaLink="false">http://jessenoller.com/?p=461</guid>
		<description><![CDATA[
One of the cooler features that came with Python 2.5's release is the 'with' statement and the context manager protocol behind it. I could make the argument that these two things alone make the upgrade to Python 2.5 more than compelling for those of you trapped in the dark ages of 2.4 or worse: ...]]></description>
			<content:encoded><![CDATA[<blockquote><p>
One of the cooler features that came with Python 2.5’s release is the ‘with’ statement and the context manager protocol behind it. I could make the argument that these two things alone make the upgrade to Python 2.5 more than compelling for those of you trapped in the dark ages of 2.4 or worse: 2.3!
</p></blockquote>
<p><em>This is a reprint of an article I wrote for <a href="http://www.pythonmagazine.com/" target="_blank">Python Magazine</a> as a Completely Different column that was published in the July 2008 issue. <b>I have republished this in its original form, bugs and all</b></em></p>
<p><span id="more-461"></span></p>
<h3>Introduction</h3>
<p>In Python 2.5, a with_statement hook was added to the ”__future__” module .  This was brought on by PEP (Python Enhancement Proposal) 343, “The with statement”. PEP 343, like many PEPs in Python, was a fusion of good ideas into a rather elegant solution.  See <a href="http://www.python.org/dev/peps/" target="_blank">http://www.python.org/dev/peps/</a> for a complete listing of PEPs, including those referenced in this article.</p>
<p>Two of the influencing PEPs, 310 (Reliable Acquisition/Release Pairs) and 319 (Python Synchronize/Asynchronize Block) were primarily focused on a system to add a simple method of acquiring and then releasing a lock.  PEP 310 proposed the ”with” statement (i.e., ”with lock:”) and PEP 319 proposed ”synchronized” and ”asynchronize” keywords that would allow you to define an function or method that would use the proposed keywords to access and modify shared objects, essentially hiding the common form of managing the lock directly:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code28'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46128"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p461code28"><pre class="python" style="font-family:monospace;">initialize_lock<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
...
<span style="color: black;">acquire_lock</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">try</span>:
    change_shared_data<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">finally</span>:
    release_lock<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>While both PEPs 310 and 319 were (are) good ideas, there were additional influences from other PEPs as well.  PEP 340, “Anonymous Block Statements”, and PEP 346, “User Defined (‘with’) Statements”, by Nick Coghlan were both important. In the end, what I think is an elegant and powerful middle ground was reached.</p>
<p>If you want a very detailed overview of all of the reasoning behind the introduction of the with statement, I recommend reading PEP 346 <a href="http://www.python.org/dev/peps/pep-0346/" target="_blank">http://www.python.org/dev/peps/pep-0346/</a>, where Nick Coghlan explains it in excellent detail with many examples.</p>
<h3>Context Managers</h3>
<p>The key thing to understand about ”with” and all of the work in the PEP is that under the covers, when you write:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code29'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46129"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p461code29"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">with</span> EXPRESSION <span style="color: black;">&#91;</span><span style="color: #ff7700;font-weight:bold;">as</span> VARIABLE<span style="color: black;">&#93;</span>:
    BLOCK OF CODE</pre></td></tr></table></div>

<p>The EXPRESSION is expanded into two calls. The first call is to the ”__enter__()” method on the object.  After the nested block completes, the object’s ”__exit__()” method is run. “as VARIABLE” is in brackets because it is an optional argument to the expression to store the return value of EXPRESSION to the BLOCK as VARIABLE name.</p>
<p>Take a look at Listing 1 for an example. In order to illustrate the methods and call order, I’ve created a simple class, Foo, that defines the required protocol methods. At the bottom of the listing.  When an instance of Foo is used in the ”with Foo()” call,  the output is simply:</p>
<p><code><br />
I<br />
like<br />
turtles<br />
</code></p>
<p>Listing 1:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code30'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46130"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p461code30"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">__future__</span> <span style="color: #ff7700;font-weight:bold;">import</span> with_statement
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Foo<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">pass</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> __enter__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;I&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> __exit__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">type</span>, value, <span style="color: #dc143c;">traceback</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;turtles&quot;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">with</span> Foo<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;like&quot;</span></pre></td></tr></table></div>

<p>As you can see, the ”__enter__()” method is called on the object, control is released and the “print turtles” code block is executed. Once the block is completed, the ”__exit__()” method is called.</p>
<p>Per the PEP, the ”__enter__()” method on the object accepts no arguments, but can perform actions (in this case, print) or return data. If an object has no data to return it should return self, although that is not required.</p>
<p>The ”__exit__()” method on the object has to accept three arguments:  type, value, and traceback, these correspond to the arguments to the ”raise” statement. These arguments are passed in because the context manager handles all exceptions during ”__exit__()”. For example, if type is ”None” then that indicates that the nested block executed successfully, without error. Otherwise the ”__exit__()” method can properly handle the exception condition and clean up the resource.</p>
<p>For example, you might ask what happens to the ”__exit__()” method execution if an exception is raised when the code block is executing. Let’s examine this further by changing the bottom part of Listing 1 to be:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code31'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46131"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p461code31"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">with</span> Foo<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">Exception</span></pre></td></tr></table></div>

<p>The output now looks like this:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code32'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46132"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p461code32"><pre class="python" style="font-family:monospace;">I
turtles
Traceback <span style="color: black;">&#40;</span>most recent call last<span style="color: black;">&#41;</span>:
  File <span style="color: #483d8b;">&quot;scratch.py&quot;</span>, line <span style="color: #ff4500;">12</span>, <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #66cc66;">&lt;</span>module<span style="color: #66cc66;">&gt;</span>
    <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">Exception</span>
<span style="color: #008000;">Exception</span></pre></td></tr></table></div>

<p>If the code block being executed raises an exception, ”__exit__()” is still called on the Foo() object. This makes it darn handy for, say, cleaning up locks, database handles, sockets, unruly children, etc. Early I mentioned that objects that define the new protocol could also return ”self”, which would then be packed into the variable defined in the [as VARIABLE].</p>
<p>Listing 2 provides a class with an ”__enter__()” method that returns the instance of the object for access by the code block. In the example, the instance of the object is associated with the variable name “baz”. Take a look at the output:</p>
<p><code><br />
setting count to 0<br />
<__main__.Foo object at 0x73bb0><br />
count is now: 4<br />
</code></p>
<p>Listing 2:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code33'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46133"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code" id="p461code33"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">__future__</span> <span style="color: #ff7700;font-weight:bold;">import</span> with_statement
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Foo<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">pass</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> __enter__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;setting count to 0&quot;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">count</span> = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> __exit__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">type</span>, value, <span style="color: #dc143c;">traceback</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;count is now: %d&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: #008000;">self</span>.<span style="color: black;">count</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> incr<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">count</span> += <span style="color: #ff4500;">1</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">with</span> Foo<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> baz:
    <span style="color: #ff7700;font-weight:bold;">print</span> baz
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>:
        baz.<span style="color: black;">incr</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>As you can see, within the for-loop in the main block of code we were able to alter the state of the object we’re reliant on. We can access all of it’s internals, change state, call methods, etc. Again, this is especially handy if you want to create something that acts as some sort of handle.</p>
<p>Let’s look at two snippets, the old way of declaring a lock, then later acquiring it to modify state:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code34'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46134"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code" id="p461code34"><pre class="python" style="font-family:monospace;">lock = RLock<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> thread_object<span style="color: black;">&#40;</span>Thread<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        lock.<span style="color: black;">acquire</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">try</span>:
            <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #008000;">self</span>.<span style="color: black;">getName</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">except</span>:
            <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">Exception</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Something is broken&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">finally</span>:
            lock.<span style="color: black;">release</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Now, let’s look at code refactored to use ”with”:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code35'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46135"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p461code35"><pre class="python" style="font-family:monospace;">lock = RLock<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> thread_object<span style="color: black;">&#40;</span>Thread<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">with</span> lock:
            <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #008000;">self</span>.<span style="color: black;">getName</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>This is possible because threading.RLock implements the new context manager protocol, go ahead, take a peek at threading.py yourself or look at the code below:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code36'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46136"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p461code36"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> _RLock<span style="color: black;">&#40;</span>_Verbose<span style="color: black;">&#41;</span>:
    __enter__ = acquire
    ...<span style="color: black;">snip</span>...
    <span style="color: #ff7700;font-weight:bold;">def</span> __exit__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, t, v, tb<span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">release</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>The lock management classes are not the only ones to implement the protocol.  The io.py, tempfile.py, and other modules all implement the protocol to allow you do do something like the following:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code37'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46137"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p461code37"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">with</span> <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;hey&quot;</span>, <span style="color: #483d8b;">&quot;r&quot;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> mfile:
    mfile.<span style="color: black;">readlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>This will automatically open, and close the file on the way in and way out. Magic! Obviously, the simple way of thinking of these is as resource managers.  For example, what if you wanted to ensure a given state was set for a particular code block? PEP 346 points out an excellent example of disabling signals during the BLOCK execution. Take a look at Listing 3 where I have implemented that very code to simply catch and ignores SIGABRT signals.</p>
<p>When the script is run in one window, and in another we start running “kill –6
<process>”, we see:</p>
<pre>
Tis but a scratch!
Tis but a scratch!
I got an abort, but I like it here.
Tis but a scratch!
Tis but a scratch!
</pre>
<p>Listing 3:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code38'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46138"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code" id="p461code38"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">__future__</span> <span style="color: #ff7700;font-weight:bold;">import</span> with_statement
<span style="color: #ff7700;font-weight:bold;">from</span> contextlib <span style="color: #ff7700;font-weight:bold;">import</span> contextmanager
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">signal</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> handler<span style="color: black;">&#40;</span>signum, frame<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;I got an abort, but I like it here.&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">pass</span>
&nbsp;
@contextmanager
<span style="color: #ff7700;font-weight:bold;">def</span> no_sigabort<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #dc143c;">signal</span>.<span style="color: #dc143c;">signal</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">signal</span>.<span style="color: black;">SIGABRT</span>, handler<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">yield</span>
    <span style="color: #dc143c;">signal</span>.<span style="color: #dc143c;">signal</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">signal</span>.<span style="color: black;">SIGABRT</span>, <span style="color: #dc143c;">signal</span>.<span style="color: black;">SIG_DFL</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">with</span> no_sigabort<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #808080; font-style: italic;"># code executed without worrying about signals</span>
    <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #008000;">True</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Tis but a scratch!&quot;</span></pre></td></tr></table></div>

<p>Instead of passing in the handler function on line 12 we could also pass in signal.SIG_IGN — which just makes the signal ignored. You can easily catch all sorts of state and react to it. Another one of the examples in PEP 346 is committing or rolling back database transactions:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code39'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46139"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p461code39"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> transaction<span style="color: black;">&#40;</span>db<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">try</span>:
        <span style="color: #ff7700;font-weight:bold;">yield</span>
    <span style="color: #ff7700;font-weight:bold;">except</span>:
        db.<span style="color: black;">rollback</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">else</span>:
        db.<span style="color: black;">commit</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Using this style, your code becomes a lot more succinct, clear, and you drastically reduce the amount of boilerplate you have to add to your application.</p>
<h3>Contextlib</h3>
<p>As part of Python 2.5 a new module ”contextlib” was introduced.  This module is an excellent reference point of how to use context managers (it’s great example code!).  It also provides some pretty cool tools. You’ve already seen me use contextlib.contextmanager to remove the need to define an object with ”__enter__()” and ”__exit__()” methods on the last example.</p>
<p>The contextlib.contextmanager decorator allows you to create nice user statements out of a simple function that yields at one point in the middle. This means you could do:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code40'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46140"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p461code40"><pre class="python" style="font-family:monospace;">@contextmanager
<span style="color: #ff7700;font-weight:bold;">def</span> test_setup<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    start database...
    <span style="color: black;">inject</span> fake data...
    <span style="color: #ff7700;font-weight:bold;">yield</span> <span style="color: black;">&#40;</span>to <span style="color: #dc143c;">test</span><span style="color: black;">&#41;</span>
    confirm result...
    <span style="color: black;">shut</span> database down...</pre></td></tr></table></div>

<p>Which allows you to:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code41'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46141"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p461code41"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> mytest<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">with</span> test_setup<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
        ... <span style="color: black;">do</span> stuff ...</pre></td></tr></table></div>

<p>You can technically do anything else you want within that decorated function, and it can take as long as you want as long as:</p>
<p>- It yields once.<br />
– It does not yield again after an exception is raised.</p>
<p>The other nice thing is that you could change the test_setup example above to accept any number or type of arguments, so tests could pass identity and other information into the test_setup function.</p>
<p>Now let’s turn this up to 11. Up until now, I’ve shown you simple examples — basically, how to get/set some resource and then release it. But did you know you could nest them? Via the contextlib.nested function, you can define a series of nested contextmanagers and then bind each one to a different variable name.</p>
<p>Let’s try a simple nested context out for starters. In the first example in Listing 4, we want to move the data from file1 to file2.  It’s easy to list the open file handles as arguments to ”nested()”, but what about mixing types?  The second example in Listing 4 (lines 8–11) mixes file handles with thread locks.</p>
<p>Listing 4:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code42'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46142"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code" id="p461code42"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python</span>
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">__future__</span> <span style="color: #ff7700;font-weight:bold;">import</span> with_statement
<span style="color: #ff7700;font-weight:bold;">from</span> contextlib <span style="color: #ff7700;font-weight:bold;">import</span> nested
&nbsp;
<span style="color: #ff7700;font-weight:bold;">with</span> nested<span style="color: black;">&#40;</span><span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;file1&quot;</span>, <span style="color: #483d8b;">&quot;r&quot;</span><span style="color: black;">&#41;</span>, <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;file2&quot;</span>, <span style="color: #483d8b;">&quot;w&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> <span style="color: black;">&#40;</span>a, b<span style="color: black;">&#41;</span>:
    b.<span style="color: black;">write</span><span style="color: black;">&#40;</span>a.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> RLock
lock = RLock<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">with</span> nested<span style="color: black;">&#40;</span>lock, <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;file1&quot;</span>, <span style="color: #483d8b;">&quot;r&quot;</span><span style="color: black;">&#41;</span>, <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;file2&quot;</span>, <span style="color: #483d8b;">&quot;w&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> <span style="color: black;">&#40;</span>a, b, c<span style="color: black;">&#41;</span>:
    c.<span style="color: black;">write</span><span style="color: black;">&#40;</span>b.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Yes, we have officially crossed into <em>maybe that’s too much</em> territory. But, you can see we can pass in any number of contextmanagers and all of them will be handled as needed. This is great if, like above, you need to acquire a lock and then perform an action which requires some cleanup.</p>
<p>Finally, we have contextlib.closing. This is, as the documentation states, “a context manager that closes <em>thing</em> upon completion of the block”. Anything with a ”close()” method is eligible to be used here. At last count on trunk, ”close()” occured at least 71 times in the Lib directory. You can use ”closing” on URLs from urllib, StringIO objects, as well as gzip objects.</p>
<p>For example, from the standard library documentation:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code43'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46143"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code" id="p461code43"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">__future__</span> <span style="color: #ff7700;font-weight:bold;">import</span> with_statement
<span style="color: #ff7700;font-weight:bold;">from</span> contextlib <span style="color: #ff7700;font-weight:bold;">import</span> closing
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib</span>
&nbsp;
url = <span style="color: #483d8b;">'http://www.python.org'</span>
<span style="color: #ff7700;font-weight:bold;">with</span> closing<span style="color: black;">&#40;</span><span style="color: #dc143c;">urllib</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> page:
    <span style="color: #ff7700;font-weight:bold;">for</span> line <span style="color: #ff7700;font-weight:bold;">in</span> page:
        <span style="color: #ff7700;font-weight:bold;">print</span> line</pre></td></tr></table></div>

<p>All three of these make it easy to factor-out code which we all end up repeating; that’s the nature of boilerplate. As we all know, less boilerplate and copy and pasted code means easier to read, and easier to manage.</p>
<h3>Let’s Go Off-Roading</h3>
<p>As I was writing this, I was trying to think of something really interesting to do with an object defining ”__enter__()” and ”__exit__()” methods that wasn’t just resource management. Then I realized, given I’m doing a lot of parallel stuff right now, I could create a threadpool that allowed jobs to be submitted to it, and the ”__exit__()” would call ”join()” on the threads and so on.</p>
<p>Fantastic idea! Within Listing 5, I have defined a basic thread object that subclasses threading.Thread. Then, in Listing 6 I define a ThreadPool, which is the context manager I will use.</p>
<p>Listing 5:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code44'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46144"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code" id="p461code44"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">__future__</span> <span style="color: #ff7700;font-weight:bold;">import</span> with_statement
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Thread
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">Queue</span> <span style="color: #ff7700;font-weight:bold;">import</span> Empty
<span style="color: #ff7700;font-weight:bold;">from</span> Listing6 <span style="color: #ff7700;font-weight:bold;">import</span> ThreadPool
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> myThread<span style="color: black;">&#40;</span>Thread<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, myq<span style="color: black;">&#41;</span>:
        Thread.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">myq</span> = myq
    <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">while</span> <span style="color: #008000;">True</span>:
            <span style="color: #ff7700;font-weight:bold;">try</span>:
                job = <span style="color: #008000;">self</span>.<span style="color: black;">myq</span>.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> job == <span style="color: #483d8b;">'STOP'</span>:
                    <span style="color: #ff7700;font-weight:bold;">break</span>
                <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #008000;">self</span>.<span style="color: black;">getName</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, job
            <span style="color: #ff7700;font-weight:bold;">except</span> Empty:
                <span style="color: #ff7700;font-weight:bold;">continue</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">with</span> ThreadPool<span style="color: black;">&#40;</span><span style="color: #ff4500;">10</span>, myThread<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> pool:
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">100</span><span style="color: black;">&#41;</span>:
        pool.<span style="color: black;">put</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Listing 6:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p461code45'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p46145"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="code" id="p461code45"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">Queue</span> <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">Queue</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> ThreadPool<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, workers, workerClass<span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">myq</span> = <span style="color: #dc143c;">Queue</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">workers</span> = workers
        <span style="color: #008000;">self</span>.<span style="color: black;">workerClass</span> = workerClass
        <span style="color: #008000;">self</span>.<span style="color: black;">pool</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> __enter__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #808080; font-style: italic;"># On entering, start all the workers, who will block trying to</span>
        <span style="color: #808080; font-style: italic;"># get work off the queue</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">workers</span><span style="color: black;">&#41;</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">pool</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">workerClass</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">myq</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">pool</span>:
            i.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">myq</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> __exit__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">type</span>, value, <span style="color: #dc143c;">traceback</span><span style="color: black;">&#41;</span>:
        <span style="color: #808080; font-style: italic;"># Now, shut down the pool once all work is done</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">pool</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">myq</span>.<span style="color: black;">put</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'STOP'</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">pool</span>:
            i.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Note that ThreadPool returns a value from __enter__(). After it builds up the worker-pool, instead of returning ”self” (which would be silly), it actually returns the queue built in the constructor. This makes it so that when we call it on line 20 in Listing 5, we get the reference to the queue we need.</p>
<p>Now, this is a nominal example. We’re not returning any results or anything, we’re just printing the numbers off of the queue as we get them.  But it demonstrates the concept of creating an object that tracks some state, sets up a resource, and then ultimately manages that resource.</p>
<p>In Listing 6, I made sure we built the pool at ”__enter__()” time rather than in the constructor because what happens if we need to do more customization or hit an exception? If we do hit an exception, we will immediately jump out and the BLOCK we’re running will not be executed.  In the ”__exit__()” method, I insert STOP tokens to tell the threads to exit their work loop.</p>
<p>If you wanted, you could use this code inside of your own application (once you make it so it returns data to the caller) to spawn worker pools on-demand, do some processing, and then cleanly shut them down with a minimal amount of boilerplate involved.</p>
<p>The nice thing about this is that all of the responsibility for management is done in the object that does all of the work itself.  There is no more needing to remember to shut down the worker pool, release the database connection, or close that socket.</p>
<h3>Conclusion</h3>
<p>I hope I’ve shown you a compelling new feature within Python that you might not have known about.  Python is evolving rapidly every day. We don’t just have things like context managers and Python 3000 to look forward to. We have a wealth of improvements going into core every single day.</p>
<p>I think people are going to really love context managers for their elegance, once they become mainstream to the language (in 2.6). Centralizing the control and management of state, resources and other-like things while reducing the total lines of code you have to debug, manage and read is a good thing.</p>
<p>Well, as long as the end result is still readable.</p>
<p>Related Links:</p>
<ul>
<li>PEP 0343 — <a href="http://www.python.org/dev/peps/pep-0343/" target="_blank">http://www.python.org/dev/peps/pep-0343/</a>
<li>PEP 0346 — <a href="http://www.python.org/dev/peps/pep-0346/" target="_blank">http://www.python.org/dev/peps/pep-0346/</a>
<li>PEP 0319 — <a href="http://www.python.org/dev/peps/pep-0319/" target="_blank">http://www.python.org/dev/peps/pep-0319/</a>
<li>PEP 0310 — <a href="http://www.python.org/dev/peps/pep-0310/" target="_blank">http://www.python.org/dev/peps/pep-0310/</a>
<li>PEP 0340 — <a href="http://www.python.org/dev/peps/pep-0340/" target="_blank">http://www.python.org/dev/peps/pep-0340/</a>
<li>Python Wiki discussion — <a href="http://wiki.python.org/moin/WithStatement" target="_blank">http://wiki.python.org/moin/WithStatement</a>
<li>Understanding Python’s “with” statement, by Fredrik Lundh — <a href="http://effbot.org/zone/python-with-statement.htm" target="_blank">http://effbot.org/zone/python-with-statement.htm</a>
<li>Python Module of the Week: contextlib, by Doug Hellmann — <a href="http://blog.doughellmann.com/2008/05/pymotw-contextlib.html" target="_blank">http://blog.doughellmann.com/2008/05/pymotw-contextlib.html</a>
</ul>
<p class="wp-flattr-button"></p>]]></content:encoded>
			<wfw:commentRss>http://jessenoller.com/2009/02/03/get-with-the-program-as-contextmanager-completely-different/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>An Interview With Adam Olsen, Author of Safe Threading &#124; Completely Different</title>
		<link>http://jessenoller.com/2009/02/02/an-interview-with-adam-olsen-author-of-safe-threading-completely-different/</link>
		<comments>http://jessenoller.com/2009/02/02/an-interview-with-adam-olsen-author-of-safe-threading-completely-different/#comments</comments>
		<pubDate>Mon, 02 Feb 2009 18:30:26 +0000</pubDate>
		<dc:creator>jesse</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[python magazine]]></category>

		<guid isPermaLink="false">http://jessenoller.com/?p=451</guid>
		<description><![CDATA[This is a reprint of an article I wrote for Python Magazine as a Completely Different column that was published in the June 2008 issue. 
A world without a Global Interpreter Lock (GIL) - the very thought of it makes some people very, very happy. At PyCon 2007 Guido openly stated that he would ...]]></description>
			<content:encoded><![CDATA[<p><em>This is a reprint of an article I wrote for <a href="http://www.pythonmagazine.com/" target="_blank">Python Magazine</a> as a Completely Different column that was published in the June 2008 issue. </em></p>
<blockquote><p>A world without a Global Interpreter Lock (GIL) — the very thought of it makes some people very, very happy. At PyCon 2007 Guido openly stated that he would not be against a GIL-less implementation of Python, provided someone coughed up the patch itself. Right now, that someone is Adam Olsen — an amateur programmer who has been working on a patch to the CPython interpreter since July of 2007.</p></blockquote>
<p>It’s PyCon. I’m supposed to be listening to a talk, but I’ve fallen down the rabbit hole of a future without a global interpreter lock. I’m locked in on getting a patched version of the interpreter up and running on Mac OS/X and the patch author, Adam Olsen, is coaching me through changes to some of the deepest internals of Python itself.<br />
<span id="more-451"></span><br />
For about a year, Adam has been working on the “safe threading” project for Python 3000. In this project, he has attempted to address many of the common issues programmers facing highly threaded and highly concurrent applications. These problems include deadlocks, isolation of shared objects (to prevent corruption/locking issues) and finally, as a side-effect of making threading safer, the removal of the Global Interpreter Lock.</p>
<p>Adam would be the first to point out that adding ”–without-gil” to the Makefile for the C version of the interpreter was actually a side-effect of the bulk of his work.  At 938 kilobytes, I would say his diff against the CPython code base that produces an interpreter with a safe, clear, and concise threading model for local concurrency is a bit more than a side effect.</p>
<p>It is clear that he lives for a concurrent and threaded world, and Adam has filled in a lot of gaps in my knowledge about concurrency in our past conversations.  I’ve been lucky enough to interview him about the safe threading project and his outlook on all of this as well.</p>
<p><strong>First off, what’s your background?</strong></p>
<p>I’m an amateur programmer, self taught.  I’ve had a long interest in object models and concurrency, such as how widgets in GTK interact.</p>
<p>I’ve explored twisted a bit, as well as Python’s existing threading. Additionally, I’ve experimented a great deal with different ways to utilize generators or threads, actors, futures, cooperative versus preemptive scheduling, and so on.</p>
<p><strong>Can you explain the basic premise behind the Safe Threading part of the project?</strong></p>
<p>Make the common uses easy.  Don’t necessarily make it impossible to get wrong (everything is a tradeoff!), but give the programmer a fighting chance.</p>
<p><strong>How about the “Free Threading” part (–without-gil)?</strong></p>
<p>Everybody seems to know you don’t need locking if you’re not modifying an object, but Python demands a traceback, not a segfault if the programmer gets it wrong.  Monitors and shareability provide a framework that satisfies both.</p>
<p>In essence, removing the GIL was a bonus to avoiding unintended conflicts between threads.</p>
<p><strong>Many people accuse the “threaded programming” paradigm as impossible to get right.  Even Brian Goetz has stated that it is extremely hard to “get right”.  If this is the case, why try to “fix” threading in Python?</strong></p>
<p>What most people see as a problem with threads, I see as a problem with the memory model.  They let threads modify the same object simultaneously, resulting in arbitrary, ill-defined results.  The complexity here explodes, quickly turning the programmer’s brain into mush.</p>
<p>The solution is to isolate most objects.  Keep all these mutable objects back in the sequential, single-threaded world.  Multiple processes let you do this.  Actors do it too.  So do monitors.</p>
<p><strong>editor</strong>: <em>Actors can be thought of as Objects (for Object Oriented programmers) except that in the Actor model, all Actors execute simultaneously and can create more Actors, maintain their own state, and communicate via asynchronous message passing. A Monitor can be thought of as an object that encapsulates another object intended for use by multiple threads.  The Monitor controls all locking semantics for the encapsulated objects.</em></p>
<p>So from that view, processes, actors, and monitors are all equivalent. The only reason I use monitors and build them on OS threads is that it fits better with the existing Python language and is much more efficient for the way Python uses them.  I could take a page from Erlang and call them “processes”, but I think in the long run that would be more confusing, not less.</p>
<p><strong>In looking through the diff for safe thread, you’ve had to touch a lot.  Everything from object allocation to the entire way threads are managed.  What was the hairiest series of changes you’ve had to make?</strong></p>
<p>Hard to say — I’ve been at this nearly a year with still lots to do. I could mention when I found that atomic refcounting didn’t scale, it spelled doom for the removal of GIL until I came up with a viable asynchronous scheme.  Straightening out object allocation was also pretty nasty, as stock CPython uses a twisted maze of macros and wrapper functions for that.</p>
<p>The worst was probably deciding how to handle class and module dictionaries.  What you’ve got here are mutable objects, inherently used simultaneously by multiple threads, no clear cutoff point after which they’re no longer modified, and a massive amount of implicit accesses ingrained as a fundamental part of the language.  I really wanted to impose some order on this, add some clear boundaries to when they’re modified versus accessed, and make them live up to the “explicit is better than implicit” ideal.</p>
<p>I couldn’t do it though.  Implicit access is too ingrained into the language.  Eventually I conceded defeat, then embraced it, codifying dict’s existing API as shareddict’s actual API.  In doing this I also switched to a relatively simple read/write lock as shareddict’s protection (relative to what came before!).  In the end, the only restriction was that the contents of shareddict must themselves be shareable.</p>
<p><strong>Isn’t a monitor equivalent to adding an @synchronized or a with:lock statement around your code? Is using a monitor for the mutable objects that much faster than lock.acquire and lock.release?</strong></p>
<p><strong>editor</strong>: See Listing1.py for a monitor example. You will need to be running Python 3000 with Adam’s patch for the code to work.</p>
<p>Listing 1:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p451code46'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p45146"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
</pre></td><td class="code" id="p451code46"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;"># See the requirements: You must apply Adam's patch to Python 3000</span>
<span style="color: #808080; font-style: italic;"># for this.</span>
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">__future__</span> <span style="color: #ff7700;font-weight:bold;">import</span> shared_module
<span style="color: #ff7700;font-weight:bold;">from</span> threadtools <span style="color: #ff7700;font-weight:bold;">import</span> Monitor, monitormethod, branch
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Counter<span style="color: black;">&#40;</span>Monitor<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;A simple counter, shared between threads&quot;&quot;&quot;</span>
    __shared__ = <span style="color: #008000;">True</span>  <span style="color: #808080; font-style: italic;"># More shared_module boilerplate</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">count</span> = <span style="color: #ff4500;">0</span>
&nbsp;
    @monitormethod
    <span style="color: #ff7700;font-weight:bold;">def</span> tick<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">count</span> += <span style="color: #ff4500;">1</span>
&nbsp;
    @monitormethod
    <span style="color: #ff7700;font-weight:bold;">def</span> value<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">count</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> work<span style="color: black;">&#40;</span>c<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span>:
        c.<span style="color: black;">tick</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> main<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    c = Counter<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">with</span> branch<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> children:
        <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">10</span><span style="color: black;">&#41;</span>:
            children.<span style="color: black;">add</span><span style="color: black;">&#40;</span>work, c<span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Number of ticks:&quot;</span>, c.<span style="color: black;">value</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Superficially, yes, but it has deeper semantics as well.  The biggest is that it imposes a shareability requirement to all objects passed in or out, and there’s no way to bypass it.  This basically forces you to be explicit about how you expect threads to modify mutable objects.</p>
<p>It also lets me use a bit saner recovery from deadlocks than would be possible using with:lock.  Not that much though and there’s certain circumstances when there are no ideal ways to recover.</p>
<p>Performance wise, lock.acquire/lock.release are irrelevant.  The real competition is with adding a lock to every object, such as a list. What seems like a simple ”if x: x.append(42)” actually requires 2 acquire/release pairs — something like ”x.extend(y)” would require a pair for every item in y.  This could easily add up to thousands of lock operations where a monitor lets you get away with just one.</p>
<p><strong>How did you handle Garbage Collection?</strong></p>
<p>Painfully.  I originally attempted to use simple atomic integer operations for the refcounting, but I found they didn’t work.  Well, they were <em>correct</em>, but they didn’t give me the benefit of removing the GIL.  Multiple CPUs/cores would fight over the cache line containing the refcount, slowing everything to a crawl.</p>
<p>I solved that by adding a second mode for refcounting.  An object starts out like normal, but once a second thread accesses the refcount it switches to an asynchronous mode.  In this mode each thread buffers up their own refcount changes, writing out all the changes to a single refcount at once.  Even better, if the net change is 0 it can avoid writing anything at all!</p>
<p>The catch is, you can no longer delete the object when the refcount hits 0, as another thread might have outstanding changes.  Instead, I modified the tracing GC to keep a list of <em>all</em> objects and had it occasionally flush the buffers and check for objects with a refcount of 0.</p>
<p><strong>Your Branching-as-children method (see Listing 2) of spawning, implemented in your patch, deviates from the current threading module approach.  Why not just overlay your work on the existing API?</strong></p>
<p>Listing 2:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p451code47'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p45147"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p451code47"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">with</span> branch<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">as</span> children:
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">10</span><span style="color: black;">&#41;</span>:
        children.<span style="color: black;">add</span><span style="color: black;">&#40;</span>work, arg1, arg2<span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Branch basically wraps up best practices into a single construct.  It propagates exceptions, handles cancellation, lets you pass out return values, and ensures you don’t accidentally leave a child thread running after you’ve returned.</p>
<p>You can still leave threads running after a function returns, you just need to use a branch that’s higher up in the call stack.  Later on, I might add a built-in one just above the main module just for this purpose.</p>
<p><strong>What about stopping/pausing child threads?</strong></p>
<p>Pausing isn’t possible, but cancellation serves the purpose of stopping. Essentially it sets a flag on that thread to tell it to stop, as well as making sure participating I/O functions will check that flag and end themselves promptly.</p>
<p><strong>How did you handle thread-safe imports?</strong></p>
<p>Most of this isn’t implemented yet, but the basic idea is that each module will be either shareable or unshareable.  Unshareable modules work normally if imported from the main thread, but if another thread tries to import one they won’t get past the parsing phase — just enough to try to detect ”from __future__ import shared_module”.</p>
<p>Modules found to be shareable are placed in their own MonitorSpace (the underlying tool used by a Monitor) before the Python code in them is executed.  This separates them from the main thread, so I won’t need the main thread’s cooperation to load them.</p>
<p><strong>In your implementation, you use libatomic-ops, essentially adding a new python build/library dependency — what does this buy you over using standard locking primitives?</strong></p>
<p>Scalability.  I can use an atomic read and, so long as the memory doesn’t get modified, all the CPUs/cores will pull it into their own cache.  If I used a lock it would inherently involve a write as well, meaning only one CPU/core would have it cached at a time.</p>
<p>For some applications it also happens to be a great deal lighter than a lock.  It may be both easier to use and faster.</p>
<p><strong>You state on your page about the Dead Lock Fallacy, that “Ultimately, good style and a robust language will produce correct programs, not a language that tries to make it impossible to go wrong.” What language tries to make it impossible to go wrong? Why (again) not just ditch threading and move to, say, Erlang?</strong></p>
<p>Concurrent Pascal would be the great old example — they introduce monitors there, but apply a great deal more restrictions as well. Ultimately though, the language is focused on hard real-time applications, and it shows.  Python and safethread are focused on general purpose applications, so usability is more important.</p>
<p>Erlang’s a pretty similar situation.  It was designed for real time, distributed, fault-tolerant applications.  It wants you to use one-way messages (not the two-way function call).  It copies everything passed in those messages.  Good tradeoffs for its focus, but bad for a general purpose language.</p>
<p><strong>What sorts of CPython bugs have you found delving this deep into the codebase?</strong></p>
<p>Just a few scattered little ones.  My favorite was a refcounting bug involving dicts, but it could only occur using recursive modification or threading — obviously with shareddict I make the latter a little more likely (but only recursive modification is possible with the normal dict).</p>
<p>Although, that’s not including the threading/interpreter state APIs. Most of that code was pretty messy; lots of bugs lurking around.  It was quite satisfying to rip it out.</p>
<p><strong>How have your changes altered the API that C extension writers use? Given that the “bonus” of the GIL is a simple interface for extension writers via the Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS macros — does safethread introduce more complexity?</strong></p>
<p>Most of my changes are cleanup and simplification.  The tp_alloc/tp_free slots are gone — everything uses PyObject_New and PyObject_Del. PyObject_GC_New/Del are gone too.  The old GIL macros are directly replaced by PyState_Suspend()/PyState_Resume().</p>
<p>However, there are new options to take advantage of.  Extensions doing I/O or other blocking operations should use the cancellation API. Modules wishing to be shareable should be audited, then apply Py_TPFLAGS_SHAREABLE/METH_SHARED, as appropriate.  However, if they do that they also need to call PyArg_RequireShareable/PyArg_RequireShareableReturn if there’s the potential to share objects between threads (MonitorSpaces technically).</p>
<p><strong>You don’t support the old threading API right now — but would it be possible to add in backwards-compatible support, or is it simply unfeasible?</strong></p>
<p>For the C API, it’d be easy to retain the old GIL macros.  Other parts may not be so easy.</p>
<p>For the Python API, some are easy, some aren’t possible, and some are just painful.</p>
<p>Adding equivalents to Lock, Semaphore, and Queue is easy.  Easier than the originals in fact.  Getting all the minor details right (such as, if you subclass it) might be harder/impossible.  Lock would not support deadlock detection, but it would be cancelable.</p>
<p>Daemon threads will likely not be supported, but, in my opinion, they’re broken by design anyway.</p>
<p>The painful part is resurrecting the GIL, so these “classic” threads can share arbitrary objects like they always did.  However, I won’t make it so global — they’ll acquire/release the main MonitorSpace instead, so all the new-style threads (created using branch()) will not be slowed down.</p>
<p><strong>Finally, you’ve pointed out that “real threading” does not equal distributed programming, only local concurrency (i.e: support for multiple cores). What do you think Python could do to support distributed computing (providing the GIL-less world comes to fruition)?</strong></p>
<p>At this point, it’s confusing.  Much of the focus is to work around the GIL, to take advantage of multiple cores.  With safethread integrated into Python, I think many of the distributed/multiprocess projects would die off.  What’s left would be the ones that *really* want to be distributed and need multiple boxes, not multiple cores.</p>
<p>In my mind, there are three main characteristics of distributed programming; although, a given framework, may only use one or two:</p>
<p>- <em>security</em> — you don’t trust the other nodes, they don’t trust you. This often takes the form of sandboxes on a local box or capability system.<br />
– <em>fault-tolerance</em> — a hardware failure on one box should only bring down that box, not every other box connected to it.  Upgrading the software of one box at a time should also be possible.<br />
– <em>latency</em> — asking another node (even on a LAN) can easily be several orders of magnitude slower than reading from your own RAM or, even better, your cache.</p>
<p>All these lead to different tradeoffs.  You really <em>need</em> to minimize the communication between nodes by pushing them apart, whereas safethread is only concerned about making it easier to write correct programs.</p>
<p>The bottom line is that safethread lets you do the easy stuff (local concurrency) so that you only need to do the hard stuff (distributed programming) when you really need it.</p>
<h3>Conclusion</h3>
<p>Everyone is welcome to download, contribute, and try out Adam’s patches.  Bug reports, code, emails are all welcome. There are active discussion on the Python 3000 mailing list about all of this and more suggestions are welcome.</p>
<p>In all, there is a lot of interest in Adam’s work.  There was a lot of discussion around concurrency, threads, and the GIL at Pycon this year, and with Python 3000 coming down the pipe with the “multicore future” looming, things are getting interesting.</p>
<h3>Related Links</h3>
<ul>
<li><a href="http://code.google.com/p/python-safethread/" target="_blank">python-safethread project </a></li>
<li><a href="http://mail.python.org/pipermail/python-3000/2008-March/012548.html" target="_blank">Project status</a></li>
<li><a href="http://en.wikipedia.org/wiki/Actor_model" target="_blank">Wikipedia: The Actor Model</a></li>
<li><a href="http://ei.cs.vt.edu/~cs5204/sp99/monitor.html" target="_blank">MONITORS in Concurrent Programming</a></li>
</ul>
<p class="wp-flattr-button"></p>]]></content:encoded>
			<wfw:commentRss>http://jessenoller.com/2009/02/02/an-interview-with-adam-olsen-author-of-safe-threading-completely-different/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Python Threads and the Global Interpreter Lock</title>
		<link>http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/</link>
		<comments>http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/#comments</comments>
		<pubDate>Sun, 01 Feb 2009 20:21:35 +0000</pubDate>
		<dc:creator>jesse</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[python magazine]]></category>

		<guid isPermaLink="false">http://jessenoller.com/?p=447</guid>
		<description><![CDATA[There are a plethora of mechanisms and technologies surrounding concurrent programming -- Python has support for many of them. In this article we will explain, examine, and benchmark Python's threading support, and discuss the much maligned Global Interpreter Lock (GIL).

This is a reprint of a featured article I wrote for Python Magazine that was ...]]></description>
			<content:encoded><![CDATA[<blockquote><p>There are a plethora of mechanisms and technologies surrounding concurrent programming — Python has support for many of them. In this article we will explain, examine, and benchmark Python’s threading support, and discuss the much maligned Global Interpreter Lock (GIL).</p></blockquote>
<p><em>This is a reprint of a featured article I wrote for <a href="http://www.pythonmagazine.com/" target="_blank">Python Magazine</a> that was published in the December 2007 issue. This article assisted in inspiring me to write PEP <a href="http://python.org/dev/peps/pep-0371/" target="_blank">371</a>.</em></p>
<p><span id="more-447"></span></p>
<h3>Introduction</h3>
<p>For many years, and especially within the last year, there has been an ongoing discussion about the concepts of concurrent and parallel programming, and how Python as a language might fit into both of these.  The discussion ranges from complaints about the global interpreter lock, to a discussion on the validity of threaded programming in general.</p>
<p>Blog posts, open letters, and articles beleaguer the fact that Python and other languages are not truly “concurrent” and can not scale to tens of cores on the modern processor. They all discuss the myriad selection of programming paradigms, solutions (both old and new) and comparisons to other languages.</p>
<p>Of special note, given that this is a Python magazine, is the discussion around the validity of Python within a highly parallel and concurrent environment, due to the current structure of the CPython interpreter, the global interpreter lock, and a lack of “Erlang-like” concurrency.</p>
<p>What is concurrency? That’s an easy thing to answer. Concurrency, when applied to application/program logic, is the simultaneous execution of tasks. For the most part, these tasks interact and pass information to and from each other and the parent. Generally speaking, anything worth doing is, sooner or later, worth doing concurrently.</p>
<p>While concurrency defines the problem (or rather, the theory behind a solution), parallelism defines the implementation. Parallelism, when applied to applications, generally refers to the actual implementation of concurrent programming. Specifically, it refers to the simultaneous execution of tasks in such a way as to take advantage of multiple processors, multiple processor cores, or even multiple machines within a computing grid.</p>
<p>There are, as with all things in computing, a myriad selection of “solutions” or paradigms to address the challenges of both concurrent and parallel computing. These include options ranging from the focus of this article (standard threading or pthreads) to micro-threads, asynchronous programming (a la Twisted), process forking, et cetera.  Some languages, like Erlang, eschew many of these to build the concept of concurrency and communication deep into the language itself, making it a fundamental truth and assumption.</p>
<p>Each one of these paradigms,technologies, or solutions has it’s own pros, cons, and intricacies that have to be understood before you as a programmer can really choose one to use.  Questions have to be answered: </p>
<ul>
<li>Do you need shared state?
<li>Do you need to scale across machines/clusters?
<li>Are you willing to use message-passing, IPC, or memory mapping?
</ul>
<p>Most people won’t, or don’t have to, answer these questions, as they will simply approach the problem with what is now the near ubiquitous solution to the concurrency “problem”: Threads, and threaded programming.</p>
<h3>A brief description of Threads</h3>
<p>A <em>Thread</em> is simply an agent spawned by the application to perform work independent of the parent process. While the term Thread and threading have referred to the concept of spawning (forking) multiple processes, more frequently they refer specifically to a <em>pthread</em>, or a worker which is spawned by the parent process, and which shares that parent’s resources.</p>
<p>Both processes and threads are created within a given programming language and then scheduled to run, either by the interpreter itself (commonly known as “green threads”), or by the operating system (“native threads”). Threads which are scheduled by the operating system are governed by the operating system’s scheduler, which dictates many things.  Among them is the usage and allocation of multiple processor resources for the execution of the child threads.</p>
<p>Now, what’s the difference between a thread and a process which you can create if both are simply workers spawned by the parent and scheduled by the operating system or interpreter? Threads fundamentally differ from processes in that they are <em>light weight</em> and <em>share memory</em>. The term “light weight” refers to the up-front cost of performing the operating system level process creation (and the requirement of passing a large amount of information and state to the spawned process). Sharing memory speaks to one of the benefits of using threads over processes. Namely, threads can share state, objects, and other information with each other and the main thread of execution (this is normally called shared context). Threads, as they live within the space of a single process, share all of the parent’s address space.</p>
<p>The very fact that threads share the same address space is one of their key and most lauded features. When a program spawns twenty threads, all twenty of those threads can have unfettered access to data within both themselves, each other, and the parent. This “simplifies” data sharing; everyone has everything.</p>
<p>With shared memory you sidestep most, if not all, of the inter-child communications questions and issues. The catch? You sacrifice your ability to easily jump from a single memory space on one machine to many machines (without a lot of work, that is).</p>
<p>Threads, just like processes, are designed to be passed off to the operating system scheduler, which dictates which CPU a given process is run on, when it is run on that CPU which means that not only do you gain shared memory with threads, you can leverage all of the local machine’s resources (in theory) including multiple cores and/or CPUs.</p>
<p>Threads are also, as mentioned before, ubiquitous.  Most modern languages, along with some of the old crusty ones like, say, C (I kid!) have support for threads. Java has the concurrency package, Ruby has the thread struct, Perl, well perl has some thing(s). C, C++, etc.,  they all support threads.</p>
<p>Another benefit of threaded programming, which is frequently understated, is organization and design. Take,for example, any system which follows the classical Producer/Consumer model, in which you have a group of <em>producer threads</em> which fill a shared queue with work items and the <em>consumer threads</em> work off of that shared queue, concurrently or asynchronously. Threads are a natural fit for this pattern because they lend themselves to the clear and clean encapsulation of logic for all of the objects, while also leveraging the shared nature of threads themselves via the work queue. This separation/encapsulation of logic (and duties) within threaded applications can frequently make them easier to read, maintain, and understand.</p>
<p>Great, right?</p>
<p>Well, not everything is happy-fun-time in Thread land. Many a great programmer has been stymied by the one thing that brought people running to Thread’s dinner-table: Shared Memory. Take, for instance, the synchronization error.</p>
<p>Synchronization errors occur when you have two threads accessing a common mutable chunk of data, whether it’s a function that in turn modifies or calls something else, or something like a shared dictionary where the two attempt to alter the same key at once (more on this later). Synchronization errors are generally hanging out behind the school smoking with… Race conditions! Take the example in Listing 1:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code48'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44748"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
</pre></td><td class="code" id="p447code48"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/python</span>
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Thread
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> myObject<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>._val = <span style="color: #ff4500;">1</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> get<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>._val
    <span style="color: #ff7700;font-weight:bold;">def</span> increment<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>._val += <span style="color: #ff4500;">1</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> t1<span style="color: black;">&#40;</span>ob<span style="color: black;">&#41;</span>:
    ob.<span style="color: black;">increment</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'t1:'</span>, ob.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">2</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> t2<span style="color: black;">&#40;</span>ob<span style="color: black;">&#41;</span>:
    ob.<span style="color: black;">increment</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'t2:'</span>, ob.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">2</span>
&nbsp;
ob = myObject<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Create two threads modifying the same ob instance</span>
thread1 = Thread<span style="color: black;">&#40;</span>target=t1, args=<span style="color: black;">&#40;</span>ob,<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
thread2 = Thread<span style="color: black;">&#40;</span>target=t2, args=<span style="color: black;">&#40;</span>ob,<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Run the threads</span>
thread1.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
thread2.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
thread1.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
thread2.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>The banality of the code notwithstanding, one of the two tests will always fail. This is because both threads are incrementing an unlocked, global value. This is a simple example of a synchronization error, one of the threads beats the other to access a shared resource, and the assertion fails. Errors such as this are common.  Alarmingly so, given that when people begin exploring threaded programming, their first instinct is to share everything (whereas many concurrency solutions are referred to as “shared-nothing”).  </p>
<p>This banal example shows a fundamental truth when applied to threaded programming in general: with great shared memory comes great responsibility. It’s so easy to have threads throwing around objects and data that programmers frequently over share. Threaded programming requires that the programmer be very, very mindful of what data needs to be shared amongst workers, how to protect (i.e. lock) that data so that you do not run into the dreaded race-condition (as the code above would) or the even more dreadful Deadlock.</p>
<p>A quick note for those of you unfamiliar with the concept: <em>deadlocks</em> occur when your application seizes up while any number of threads lock-up waiting for the other threads to free required resources, never to return. Deadlocks and synchronization errors in threaded applications are not only painful to debug, they’re also insidious, hard to find, and easy to make.</p>
<p>The rise of threaded programming in modern languages really brings concurrent programs to everyone.  They are the commoditization of concurrency and parallelism; everyone can (and sooner or later does) write a program with threads. The modern thread is the de facto concurrent programming solution to many modern programmers. They have truly become ubiquitous.</p>
<p>Everyone who writes applications (or even a simple script) will write an application that does more than one thing at once, whether it’s calculating prime numbers or downloading pictures of cats on the internet.  And sooner or later, they’re being yelled at by their spouse to come to bed because they’re up at 3 am and have been chasing synchronization issues for the past twelve hours. 3 am is also when deadlocks jump out of the closet and lock up your applications, right after your 3 month old spits up on your keyboard.</p>
<p><em>Python Thread Support</em></p>
<p>To quote the Python <b>thread</b> module documentation:</p>
<blockquote><p>
“The design of this module is loosely based on Java’s threading model.  However, where Java makes locks and condition variables basic behavior of every object, they are separate objects in Python. Python’s <b>Thread</b> class supports a subset of the behavior of Java’s <b>Thread</b> class.  Currently, there are no priorities, no thread groups, and threads cannot be destroyed, stopped, suspended, resumed, or interrupted. The static methods of Java’s <b>Thread</b>, when implemented, are mapped to module-level functions.”</p></blockquote>
<p>Python’s thread support, obviously, is loosely based off of Java’s (something many a Java-to-Python convert has been happy for, and stymied by).  Python’s thread implementation is very simple at it’s core, and builds easily on the concept of individualized workers and shared state.  It provides all of the threaded programming primitives: locks, semaphores, and all the normal goodies that come with a good threaded implementation.</p>
<p>Let’s look at the simplest thread example:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code49'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44749"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code" id="p447code49"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Thread
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> myfunc<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;hello, world!&quot;</span>
&nbsp;
thread1 = Thread<span style="color: black;">&#40;</span>target=myfunc<span style="color: black;">&#41;</span>
thread1.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
thread1.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>There, you’ve got a multi-threaded application in seven lines of code.  This little script has two threads: the main thread, and the ”thread1” object we created. Obviously, here, I’m not sharing anything. Heck, I’m hardly doing anything at all! What we are doing is simple: I am creating a new <b>Thread</b> object and passing it a function to run. We are then calling ”start()”, which sends the thread off to be executed.  The ”join()” method blocks the main application execution until the thread that we’ve called ”join()” on exits, this prevents a generic “poll the thread until it is done” loop one might otherwise construct.  </p>
<p>Python’s library has “two” threading-related modules: <b>thread</b> and <b>threading</b>.  Two is in quotation marks because really it’s only got one that most people would care about, and that’s <b>threading</b>. The <b>threading</b> module builds on the primitives from the <b>thread</b> module that we mentioned earlier. <b>Threading</b> is built using <b>thread</b>, and there are few, if any, reasons for most developers to use the <b>thread</b> module directly.</p>
<p>Let’s take a moment to revisit the statement that Python’s thread support is a Java-Like implementation. Let’s set a basic Python example next to a simple Java example.  First Java,</p>
<p>MyThread.java:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code50'); return false;">View Code</a> JAVA</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44750"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p447code50"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.demo.threads</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> MyThread <span style="color: #000000; font-weight: bold;">implements</span> <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Arunnable+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">Runnable</span></a> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> run<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Asystem+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">System</span></a>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;I am a thread!&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>MyThreadDemo.java:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code51'); return false;">View Code</a> JAVA</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44751"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code" id="p447code51"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.demo.threads</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> MyThreadDemo <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> main<span style="color: #009900;">&#40;</span> <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Astring+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">String</span></a><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> args <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        MyThread foobar <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> MyThread<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        Thread1 <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <a href="http://www.google.com/search?hl=en&amp;q=allinurl%3Athread+java.sun.com&amp;btnI=I%27m%20Feeling%20Lucky"><span style="color: #003399;">Thread</span></a><span style="color: #009900;">&#40;</span>foobar<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        Thread1.<span style="color: #006633;">start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Now, let’s do the same thing, with Python:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code52'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44752"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code" id="p447code52"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Thread
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> MyThread<span style="color: black;">&#40;</span>Thread<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;I am a thread!&quot;</span>
&nbsp;
foobar = MyThread<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
foobar.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
foobar.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>As you can see, the two are not that far off (albeit the Python example is shorter).  Both <b>MyThread</b> classes extend a superclass <b>Runnable</b> or <b>Thread</b>.  These superclasses provide the basic interfaces which dictate that the object is a valid “thread object”. In both cases, we can skip over defining the constructor for the object, as it’s already been handled by the superclass (but we could make a new one if we wanted to).</p>
<p>Both <b>MyThread</b> objects have a common method, though: ”run()”. When making an object a thread this is the only method which really needs to be defined. Once a valid ”run()” method has been defined, the object is now threadable. A thread object may have any number of methods, callbacks, decorators, etc. We have, in wandering around the Python Cheeseshop, seen some modules simply extend <b>thread</b> so that an object can support threaded implementations out of the box.  The implementation that’s provided may not be threaded itself.</p>
<p>Now, as mentioned before, synchronization errors plague threaded programs.  Unless access to a shared resource has proper controls (locks) applied to it, multiple threads accessing the same resource will contend for that resource. Contention leads to deadlocks, deadlocks lead to suffering, and all that jazz.</p>
<p>Let’s take a look at another simple example commonly used to explain threading — the Bank example, in Listing 2. This is a simple thing really; the <b>Bank</b> is a shared object which is manipulated simultaneously by the threads we spawn. In this example, we spawn a <b>Bank</b> with 100 accounts, and a 1000 “dollar” balance. This means that the <b>Bank</b> should only ever contain $100,000.  The bank accounts should never lose money, nor should they ever gain money above what was in the accounts to begin with.</p>
<p>Listing 2:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code53'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44753"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
</pre></td><td class="code" id="p447code53"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Thread
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">operator</span> <span style="color: #ff7700;font-weight:bold;">import</span> add
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">random</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Bank<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, naccounts, ibalance<span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>._naccounts = naccounts
        <span style="color: #008000;">self</span>._ibalance = ibalance
&nbsp;
        <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> n <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>._naccounts<span style="color: black;">&#41;</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>._ibalance<span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> size<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> getTotalBalance<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">reduce</span><span style="color: black;">&#40;</span>add, <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> transfer<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, name, afrom, ato, amount<span style="color: black;">&#41;</span>:
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#91;</span>afrom<span style="color: black;">&#93;</span> <span style="color: #66cc66;">&lt;</span> amount: <span style="color: #ff7700;font-weight:bold;">return</span>
&nbsp;
        <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#91;</span>afrom<span style="color: black;">&#93;</span> -= amount
        <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#91;</span>ato<span style="color: black;">&#93;</span> += amount
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;%-9s %8.2f from %2d to %2d Balance: %10.2f&quot;</span> <span style="color: #66cc66;">%</span> \
            <span style="color: black;">&#40;</span>name, amount, afrom, ato, <span style="color: #008000;">self</span>.<span style="color: black;">getTotalBalance</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> transfer<span style="color: black;">&#40;</span>Thread<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, bank, afrom, maxamt<span style="color: black;">&#41;</span>:
        Thread.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>._bank = bank
        <span style="color: #008000;">self</span>._afrom = afrom
        <span style="color: #008000;">self</span>._maxamt = maxamt
    <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">3000</span><span style="color: black;">&#41;</span>:
            ato = <span style="color: #dc143c;">random</span>.<span style="color: black;">choice</span><span style="color: black;">&#40;</span><span style="color: #008000;">range</span><span style="color: black;">&#40;</span>b.<span style="color: black;">size</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
            amount = <span style="color: #008000;">round</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>._maxamt <span style="color: #66cc66;">*</span> <span style="color: #dc143c;">random</span>.<span style="color: #dc143c;">random</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>, <span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span>
            <span style="color: #008000;">self</span>._bank.<span style="color: black;">transfer</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">getName</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, <span style="color: #008000;">self</span>._afrom, ato, amount<span style="color: black;">&#41;</span>
&nbsp;
naccounts = <span style="color: #ff4500;">100</span>
initial_balance = <span style="color: #ff4500;">1000</span>
&nbsp;
b = Bank<span style="color: black;">&#40;</span>naccounts, initial_balance<span style="color: black;">&#41;</span>
&nbsp;
threads = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, naccounts<span style="color: black;">&#41;</span>:
    threads.<span style="color: black;">append</span><span style="color: black;">&#40;</span>transfer<span style="color: black;">&#40;</span>b, i, <span style="color: #ff4500;">100</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    threads<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span>.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, naccounts<span style="color: black;">&#41;</span>:
    threads<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>In the normal example, the ”for i in range(0, 3000):” would be a ”while True:”, but we don’t need forever, baby. Go ahead, run it.  You’ll notice something within a few thousand transactions: the <b>Bank</b> balance goes awry. It may not always happen, though.  I’ve run this successfully four times in a row now with zero corruption. However, the fifth time is the charm:</p>
<pre>
...
Thread-88     1.26 from 87 to 41 Balance:  100000.00
Thread-88     0.65 from 87 to 23 Balance:  100000.00
Thread-88     0.47 from 87 to  8 Balance:  100000.00
Thread-88     0.04 from 87 to 28 Balance:  100000.00
Thread-83    30.14 from 82 to  9 Balance:   99948.90
Thread-83    35.70 from 82 to 67 Balance:   99948.90
Thread-80     0.11 from 79 to 98 Balance:   99948.90
Thread-83    30.82 from 82 to 89 Balance:   99948.90
Thread-83    40.51 from 82 to 95 Balance:   99948.90
Thread-83     5.20 from 82 to 52 Balance:   99948.90
...
</pre>
<p>That’s the beauty of synchronization errors and race conditions. They won’t always happen.  They may not even happen fifty percent of the time. But they <i>will</i> happen. They good news? the fix is a few lines:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code54'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44754"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code" id="p447code54"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Thread, Lock
...
<span style="color: black;">lock</span> = Lock<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">class</span> bank<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, naccounts, ibalance<span style="color: black;">&#41;</span>:
...
    <span style="color: #ff7700;font-weight:bold;">def</span> transfer<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, name, afrom, ato, amount<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#91;</span>afrom<span style="color: black;">&#93;</span> <span style="color: #66cc66;">&lt;</span> amount: <span style="color: #ff7700;font-weight:bold;">return</span>
        lock.<span style="color: black;">acquire</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">try</span>:
            <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#91;</span>afrom<span style="color: black;">&#93;</span> -= amount
            <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#91;</span>ato<span style="color: black;">&#93;</span> += amount
        <span style="color: #ff7700;font-weight:bold;">finally</span>:
            lock.<span style="color: black;">release</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>There. Problem fixed! See, Python, much like Java, allows the broken version of this example to have one thread performing the account decrement while another performs the increment. Both threads are in the same code block of the shared object at once, and while one has already decremented an account another thread may be decrementing the same account again.</p>
<p>Wait what? What did we do? I’ve added a <b>Lock</b>, one of those nice fundamentals that threading support needs. “Wrapping” a  resource (whether it is a variable, method or some other object) in a lock as I’ve done above makes execution of that resource (in this case  the accounts being modified) thread safe, which is to say multiple threads can be running, but only one can be modifying that particular resource at a time. In order to run the code, a thread has to first get the lock.  </p>
<p>Locks and their management are the fix for the problem threading introduces when accessing shared memory and objects. But don’t think all’s well that ends well; lock management can also drive you bonkers.</p>
<p>So, Python has all of the nice bits of threading, right? Well, one thing a Java programmer might be missing is the magic of the ”synchronized” keyword. In Java, the ”synchronized” keyword, applied to a method or variable, turns access to that method, or modification of the variable, into an instantly thread safe action.</p>
<p>For example, the Java version of the transfer code:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code55'); return false;">View Code</a> JAVA</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44755"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p447code55"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">synchronized</span> <span style="color: #000066; font-weight: bold;">void</span> transfer<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> afrom, 
                                  <span style="color: #000066; font-weight: bold;">int</span> ato, 
                                  <span style="color: #000066; font-weight: bold;">double</span> amount<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	...
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>The ”synchronized” keyword basically manages the locks and conditions automagically.  When the method is entered, the lock is acquired. When it is exited, the lock is released. The good news for Python is that with decorators we can at least fudge out a synchronized keyword, so instead of peppering our code with lock calls, we can do this:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code56'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44756"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code" id="p447code56"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Lock
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">__future__</span> <span style="color: #ff7700;font-weight:bold;">import</span> with_statement
<span style="color: #ff7700;font-weight:bold;">def</span> synchronized<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    the_lock = Lock<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> fwrap<span style="color: black;">&#40;</span>function<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">def</span> newFunction<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>args, <span style="color: #66cc66;">**</span>kw<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">with</span> the_lock:
                <span style="color: #ff7700;font-weight:bold;">return</span> function<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>args, <span style="color: #66cc66;">**</span>kw<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> newFunction
    <span style="color: #ff7700;font-weight:bold;">return</span> fwrap
&nbsp;
...
    @synchronized<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> transfer<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, name, afrom, ato, amount<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">accounts</span><span style="color: black;">&#91;</span>afrom<span style="color: black;">&#93;</span> <span style="color: #66cc66;">&lt;</span> amount: <span style="color: #ff7700;font-weight:bold;">return</span>
...</pre></td></tr></table></div>

<p>Now, in this code example we used a bit of Python 2.5: the ”with” statement, which allows you to condense some amount of ”try”/”except”/”finally” code via context managers. Since the inner workings of ”with” are out of scope, take a look at PEP 343 (<a href="http://docs.python.org/whatsnew/pep-343.html" target="_blank">http://docs.python.org/whatsnew/pep-343.html</a>).</p>
<p>For easy recipes with implementations of synchronized decorators see:</p>
<ul>
<li><a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/533135" target="_blank">http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/533135</a> and
<li><a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502291" target="_blank">http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502291</a>
</ul>
<p>So, now that you’ve got all of the basic lock support, and magical decorators, and you’ve been warned to carefully think about what to share, all’s fun and games and you’re going to run off and promptly write an application with 1000 threads and do cool stuff, right? Well, there’s something else you need to know about: the Global Interpreter Lock.</p>
<h3>The Global Interpreter Lock</h3>
<p>“Nevertheless, you’re right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities.” Guido Van Rossum <a href="http://mail.python.org/pipermail/python-3000/2007-May/007414.html" target="_blank">http://mail.python.org/pipermail/python-3000/2007-May/007414.html</a></p>
<p>This is the part of the article where we take some air out of your threaded sails. It’s unfortunate this responsibility must be mine, but let us not dwell on it. First let me describe what the GIL is.  </p>
<p>The GIL is an interpreter-level lock.  This lock prevents execution of multiple threads at once in the Python interpreter. Each thread that wants to run must wait for the GIL to be released by the other thread, which means your multi-threaded Python application is essentially single threaded, right? Yes. Not exactly. Sort of.</p>
<p>CPython uses what’s called “operating system” threads under the covers, which is to say each time a request to make a new thread is made, the interpreter actually calls into the operating system’s libraries and kernel to generate a new thread. This is the same as Java, for example.  So in memory you really do have multiple threads and normally the operating system controls which thread is scheduled to run. On a multiple processor machine, this means you could have many threads spread across multiple processors, all happily chugging away doing work.</p>
<p>However, while CPython does use operating system threads (in theory allowing multiple threads to execute within the interpreter simultaneously), the interpreter also forces the GIL to be acquired by a thread before it can access the interpreter and stack and can modify Python objects in memory all willy-nilly.  The latter point is why the GIL exists: The GIL prevents simultaneous access to Python objects by multiple threads. But this does not save you (as illustrated by the Bank example) from being a lock-sensitive creature; you don’t get a free ride. The GIL is there to protect the interpreters memory, not your sanity.</p>
<p>The GIL also keeps garbage collection (the reason you don’t have to worry about memory management, you bum) working.  It prevents one thread from decrementing the counters for an object and letting the object go into the ether while another object is working with that object.  Python’s garbage collection (deallocating unused objects to free memory) utilizes the concept of reference counting. This is where all references to a given object (integer, string or ”YourCat(object)”) are tracked.  When the number of references reaches zero, the object is deleted. The GIL prevents any two threads from decrementing the reference count to any object to 0 while another thread is working on that object.  Remember, only one thread can access a Python object at a time.</p>
<p>In reality, the CPython GIL is designed with an eye towards a simpler interpreter implementation and single threaded execution speed. It makes the maintenance of the interpreter (and by definition, the writing of extension modules) easier by removing the need to worry about many memory management and concurrency issues that otherwise might be problematic to maintainers and extension authors. It keeps the reference interpreter simple. It is not, ultimately, an “end user” feature unless you’re writing C code for Python.</p>
<p>Python has had threading support, and the GIL, since as far back as version 1.5, so it’s not new. In 1999 Greg Stein created a patch set for the interpreter that removed the GIL, but added granular locking around sensitive interpreter operations. This patch set had the direct effect of speeding up threaded execution, but made single threaded execution two times slower.</p>
<p>So you may be wondering, “if we have the GIL, and a thread must own it to execute within the interpreter, what decides if the GIL should be released?” Delicious byte code instructions. When a Python application is executed, it is compiled down to byte code, the actual instructions that the interpreter uses for execution of the application.  Normally, byte code files end with a name like “.pyc” or “.pyo”. A given line of a Python application might be a single byte code, while others, such as an import statement, may ultimately expand into many byte code instructions for the interpreter.</p>
<p>That all being said, the CPython interpreter, when working with pure Python code (more on this in a moment) will force the GIL to be released every hundred byte code instructions. This means that if you have a complex line of code like a complex math function that in reality acts as a single byte code the GIL will not be released for the period that that statement takes to run.</p>
<p>There is an exception though: C modules! C extension modules (and built in C modules) can be built in such a way that they release the GIL voluntarily and do their own magic. Take for instance the <b>time</b> module (”timemodule.c” in the Python source tree).  The ”sleep()” function looks something like this:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code57'); return false;">View Code</a> C</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44757"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p447code57"><pre class="c" style="font-family:monospace;">&nbsp;
    ...
    <span style="color: #202020;">Py_BEGIN_ALLOW_THREADS</span>
        sleep<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span><span style="color: #009900;">&#41;</span>secs<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Py_END_ALLOW_THREADS
    ....</pre></td></tr></table></div>

<p>In a C extension, the ”Py_BEGIN_ALLOW_THREADS” and ”Py_END_ALLOW_THREADS” macros signal the interpreter and basically state “hey, I’m entering some blocking operation, here’s the GIL back” and “hey, I’m returning, I need the GIL”. This means that anything in your application that uses a blocking I/O function (network/socket manipulation, file manipulation) or a thread-safe C extension (most of the built-in ones are) can “bypass” the GIL. This means you can get closer to having multiple threads running at concurrently.</p>
<p>Take for a moment, the ”timemodule.c” code we pasted above. This means that if you have a threaded application, and want the GIL to be released regularly by your threads, you can call ”time.sleep(.0001)” or some other tiny amount, and the GIL will be magically released, and your other thread(s) will run. Most application developers wouldn’t like this solution, but it is a common “work around” for the GIL limitation.  </p>
<p>There are other macros and a lot more details about the C API and the GIL. The newer macros ”PyGILState_STATE_Ensure” and ”PyGILState_STATE_Release” do all of the low level state and GIL manipulation for you. we recommend reading section 8.1 of the Python/C API Reference Manual.  <a href="http://docs.python.org/api/threads.html" target="_blank">http://docs.python.org/api/threads.html</a></p>
<p>From a programming standpoint, the GIL is equivalent to wrapping all of your code in a ”synchronize” keyword (without the memory safety). No two threads can run at once, they can only seem to via GIL acquisition/releasing tricks.</p>
<p>There are other ways to accelerate the GIL manipulation or avoid it:</p>
<p>- call ”time.sleep()”<br />
– set ”sys.setcheckinterval()”<br />
– run Python in optimized mode<br />
– dump process-intensive tasks into C-extensions<br />
– use the <b>subprocess</b> module to execute commands</p>
<p>The fact is, the GIL does prevent you as a programmer from using multiple CPUs simultaneously. Python as a language, however, <i>does not</i>.  If the CPython interpreter had the GIL removed, the operating system’s pthread system would be free to run everything in parallel. The GIL does not prevent a process from running on a different processor of a machine.  It simply only allows one thread to run at once within the interpreter.</p>
<p>The real question you have to ask yourself is: does the GIL actually affect you and your application? Is it really harming you or is it simply a convenient excuse for people to dismiss Python? Let’s examine code and numbers.</p>
<h3>Benchmarking Python Threads</h3>
<p>Let’s get our hands dirty with more code. First, a word on these benchmarks: Benchmarks are handy excuses for people to get offended, or to tweak them to make an argument.  I do not intend either of these with these numbers; run the code yourself and make your own assumptions.</p>
<p>All tests are being run on a MacBook Pro, 2.33 GHz Intel Core 2 Duo with 3GB of ram and a 7200 RPM hard drive running Leopard and CPython 2.5.1 from <a href="http://www.python.org" target="_blank">http://www.python.org</a>. For the process tests, we’re using the <b>processing</b> module, previously covered in this magazine.  <a href="http://pypi.python.org/pypi/processing/" target="_blank">http://pypi.python.org/pypi/processing/</a></p>
<p>The following tests will call a given function one time, in one hundred loops. I then display the best of 100 calls. I cycle between a non-threaded iteration based call, a threaded call, and finally a processing module (fork and exec) call. I iterate each test for an increasing number of calls/threads. I will go through 1, 2, 4 and finally 8 calls/threads/processes.</p>
<p>Why are we adding process results into this? The answer is simple: we know in advance the GIL is going to penalize our execution, and using it sidesteps the GIL entirely, allowing us to see how fast execution <i>could</i> be.</p>
<p>For non-threaded execution, to keep things fair, we simply call the defined function sequentially the same number of times as threads/processes we would otherwise use. All examples use new-style classes to further even the playing field, although we skip explicit ”__init__()” setup as it’s not needed.  </p>
<p>To keep things simple, I’ve delegated all of the timings in this to Python’s <b>timeit</b> module.  The <b>timeit</b> module is designed to benchmark pieces of Python code — generally single statements. In our case however, it provides some nice facilities for looping over a given function a set number of times and returning to us the fastest run time.</p>
<p>You can see the timing execution script in Listing 3.  In one argument, he script accepts a module to look into for ”function_to_run()”, so for each one we just save the example function to a file.  It then iterates over the tests and displays the results. This setup allows us to swap out the test function easily.</p>
<p>Listing 3:</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code58'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44758"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
</pre></td><td class="code" id="p447code58"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/python</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">threading</span> <span style="color: #ff7700;font-weight:bold;">import</span> Thread
<span style="color: #ff7700;font-weight:bold;">from</span> processing <span style="color: #ff7700;font-weight:bold;">import</span> Process
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> threads_object<span style="color: black;">&#40;</span>Thread<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        function_to_run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> nothreads_object<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        function_to_run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> process_object<span style="color: black;">&#40;</span>Process<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> run<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        function_to_run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> non_threaded<span style="color: black;">&#40;</span>num_iter<span style="color: black;">&#41;</span>:
    funcs = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">int</span><span style="color: black;">&#40;</span>num_iter<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        funcs.<span style="color: black;">append</span><span style="color: black;">&#40;</span>nothreads_object<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> funcs:
        i.<span style="color: black;">run</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> threaded<span style="color: black;">&#40;</span>num_threads<span style="color: black;">&#41;</span>:
    funcs = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">int</span><span style="color: black;">&#40;</span>num_threads<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        funcs.<span style="color: black;">append</span><span style="color: black;">&#40;</span>threads_object<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> funcs:
        i.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> funcs:
        i.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> processed<span style="color: black;">&#40;</span>num_processes<span style="color: black;">&#41;</span>:
    funcs = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">int</span><span style="color: black;">&#40;</span>num_processes<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        funcs.<span style="color: black;">append</span><span style="color: black;">&#40;</span>process_object<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> funcs:
        i.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> funcs:
        i.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> show_results<span style="color: black;">&#40;</span>func_name, results<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;%-23s %4.6f seconds&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>func_name, results<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
    <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
    <span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">timeit</span> <span style="color: #ff7700;font-weight:bold;">import</span> Timer
&nbsp;
    repeat = <span style="color: #ff4500;">100</span>
    number = <span style="color: #ff4500;">1</span>
&nbsp;
    num_threads = <span style="color: black;">&#91;</span> <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">8</span> <span style="color: black;">&#93;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span> <span style="color: #ff4500;">2</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Usage: %s module_name'</span> <span style="color: #66cc66;">%</span> <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'  where module_name contains a function_to_run function'</span>
        <span style="color: #dc143c;">sys</span>.<span style="color: black;">exit</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
    module_name = <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> module_name.<span style="color: black;">endswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'.py'</span><span style="color: black;">&#41;</span>:
        module_name = module_name<span style="color: black;">&#91;</span>:-<span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Importing %s'</span> <span style="color: #66cc66;">%</span> module_name
    m = <span style="color: #008000;">__import__</span><span style="color: black;">&#40;</span>module_name<span style="color: black;">&#41;</span>
    function_to_run = m.<span style="color: black;">function_to_run</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Starting tests'</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> num_threads:
        t = Timer<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;non_threaded(%s)&quot;</span> <span style="color: #66cc66;">%</span> i, <span style="color: #483d8b;">&quot;from __main__ import non_threaded&quot;</span><span style="color: black;">&#41;</span>
        best_result = <span style="color: #008000;">min</span><span style="color: black;">&#40;</span>t.<span style="color: black;">repeat</span><span style="color: black;">&#40;</span>repeat=repeat, number=number<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        show_results<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;non_threaded (%s iters)&quot;</span> <span style="color: #66cc66;">%</span> i, best_result<span style="color: black;">&#41;</span>
&nbsp;
        t = Timer<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;threaded(%s)&quot;</span> <span style="color: #66cc66;">%</span> i, <span style="color: #483d8b;">&quot;from __main__ import threaded&quot;</span><span style="color: black;">&#41;</span>
        best_result = <span style="color: #008000;">min</span><span style="color: black;">&#40;</span>t.<span style="color: black;">repeat</span><span style="color: black;">&#40;</span>repeat=repeat, number=number<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        show_results<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;threaded (%s threads)&quot;</span> <span style="color: #66cc66;">%</span> i, best_result<span style="color: black;">&#41;</span>
&nbsp;
        t = Timer<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;processed(%s)&quot;</span> <span style="color: #66cc66;">%</span> i, <span style="color: #483d8b;">&quot;from __main__ import processed&quot;</span><span style="color: black;">&#41;</span>
        best_result = <span style="color: #008000;">min</span><span style="color: black;">&#40;</span>t.<span style="color: black;">repeat</span><span style="color: black;">&#40;</span>repeat=repeat, number=number<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        show_results<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;processes (%s procs)&quot;</span> <span style="color: #66cc66;">%</span> i, best_result<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>,
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Iterations complete'</span></pre></td></tr></table></div>

<p>Test one establishes some baseline numbers by executing an empty function. This will show us the overhead associated with each of the mechanisms I am testing as applied to object creation and execution.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code59'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44759"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p447code59"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> function_to_run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">pass</span></pre></td></tr></table></div>

<p>Results of the code above:</p>
<pre>
non_threaded (1 iters)  0.000003 seconds
threaded (1 threads)    0.010256 seconds
processes (1 procs)     0.004803 seconds

non_threaded (2 iters)  0.000007 seconds
threaded (2 threads)    0.020478 seconds
processes (2 procs)     0.012630 seconds

non_threaded (4 iters)  0.000010 seconds
threaded (4 threads)    0.040831 seconds
processes (4 procs)     0.010525 seconds

non_threaded (8 iters)  0.000017 seconds
threaded (8 threads)    0.080949 seconds
processes (8 procs)     0.017513 seconds
</pre>
<p>So, we can see that we take a hit just to add the threads and process spawning to the calls. This is expected; besides the fact that Python is optimized for single threaded performance, the simple act of creating the threads and subprocesses adds an up-front cost. Look at the first group of results — the threaded call has a much higher cost than the non-threaded version. Also of interest is the fact that the cost of adding threads/runs grows parallel to the number of threads added — 8 threads taking 0.080949, 4 threads taking 0.040831, and so on.  </p>
<p>Keep in mind that the point of adding threads is not to speed up initialization, but rather to add concurrency to the application. In a less contrived example, we might create a pool of threads once and reuse the workers, allowing us to split up a large data set and run the same function over different parts (a.k.a: the Producer/Consumer model). So while this is not the norm for concurrent applications, these tests are designed to be simple.</p>
<p>One common application of threaded (and non-threaded) programming is to do number crunching.  Let’s take a simple, brute force approach to doing Fibonacci number crunching, noting of course that we’re not sharing state here.  I am just trying to have two tasks generate a set number of Fibonacci numbers.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code60'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44760"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p447code60"><pre class="python" style="font-family:monospace;">\
<span style="color: #ff7700;font-weight:bold;">def</span> function_to_run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
  a, b = <span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>
  <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">100000</span><span style="color: black;">&#41;</span>:
    a, b = b, a + b</pre></td></tr></table></div>

<p>Results of the code above:</p>
<pre>
non_threaded (1 iters)  0.276594 seconds
threaded (1 threads)    0.280199 seconds
processes (1 procs)     0.290740 seconds

non_threaded (2 iters)  0.559094 seconds
threaded (2 threads)    0.564981 seconds
processes (2 procs)     0.299791 seconds

non_threaded (4 iters)  1.117339 seconds
threaded (4 threads)    1.133981 seconds
processes (4 procs)     0.580096 seconds

non_threaded (8 iters)  2.235245 seconds
threaded (8 threads)    2.275226 seconds
processes (8 procs)     1.159978 seconds
</pre>
<p>As you can see from this data, adding the additional threads doesn’t buy us anything — you would expect that the threaded examples would run in parallel — but we can see that the threaded example runs at the same speed (slightly slower) than the single-threaded examples. Adding threads in this case actively harms us. The function executed is pure Python, and due to the simple overhead of creating the threads and the GIL, the threaded runs could never be faster than the single threaded or process-based examples. Again, remember that the GIL only allows one thread to be active in the interpreter at any given time.</p>
<p>Now, let’s do an I/O-bound task, like reading 1000 1-kilobyte chunks off of ”/dev/urandom”!</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code61'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44761"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p447code61"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> function_to_run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    fh = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;/dev/urandom&quot;</span>, <span style="color: #483d8b;">&quot;rb&quot;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1000</span><span style="color: black;">&#41;</span>:
        fh.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1024</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Results of the code above:</p>
<pre>
non_threaded (1 iters)  0.125532 seconds
threaded (1 threads)    0.125908 seconds
processes (1 procs)     0.140314 seconds

non_threaded (2 iters)  0.251784 seconds
threaded (2 threads)    0.250818 seconds
processes (2 procs)     0.261338 seconds

non_threaded (4 iters)  0.503835 seconds
threaded (4 threads)    0.501558 seconds
processes (4 procs)     0.511969 seconds

non_threaded (8 iters)  1.006956 seconds
threaded (8 threads)    1.003003 seconds
processes (8 procs)     1.009011 seconds
</pre>
<p>We’re starting to see threads pass by the single threaded approach with the file I/O task, but not by much.  However, it is at least neck-and-neck with the single threaded implementation and faster than the processing example. The latter is an interesting point, too.  This means that if you can “dodge” the GIL you can potentially hit process speeds.</p>
<p>Keep in mind that you would not really use threads like the benchmark script we’ve put together does. Generally speaking, you’d be appending things to a queue, taking them off, and doing other shared-state tasks.  Having a bunch of threads off running the same function, while useful, is not a common use-case for a concurrent program, unless you’re splitting up and processing large data sets.</p>
<p>For a quick final example let’s look at the results when using the <b>socket</b> module.  This is the module that all network I/O goes through, it’s in C, and it’s thread safe. To exclude network latency issues we will connect to another computer’s Apache web server on the LAN (not optimized for load) and we’ll use <b>urllib2</b> instead of the raw <b>socket</b> library — <b>urllib2</b> uses the <b>socket</b> library under the covers. Grabbing URLs is a common enough use case, rather than just connecting to a socket over and over. I will also lower the count of the requests since, generally speaking, jackhammering your web server makes your web server the bottleneck. Given that this one is not tuned, we will keep it simple. All that is being pulled down from Apache is the default welcome-page.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p447code62'); return false;">View Code</a> PYTHON</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p44762"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p447code62"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> function_to_run<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">10</span><span style="color: black;">&#41;</span>:
        f = <span style="color: #dc143c;">urllib</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;http://10.0.1.197&quot;</span><span style="color: black;">&#41;</span>
        f.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>Results of the code above:</p>
<pre>
non_threaded (1 iters)  0.123033 seconds
threaded (1 threads)    0.121244 seconds
processes (1 procs)     0.141433 seconds

non_threaded (2 iters)  0.250751 seconds
threaded (2 threads)    0.223357 seconds
processes (2 procs)     0.242443 seconds

non_threaded (4 iters)  0.486189 seconds
threaded (4 threads)    0.438107 seconds
processes (4 procs)     0.448466 seconds

non_threaded (8 iters)  0.986121 seconds
threaded (8 threads)    0.881546 seconds
processes (8 procs)     0.859714 seconds
</pre>
<p>Take a look at the last two sets of numbers, minus the processing examples:</p>
<pre>
non_threaded (4 iters)  0.486189 seconds
threaded (4 threads)    0.438107 seconds

non_threaded (8 iters)  0.986121 seconds
threaded (8 threads)    0.881546 seconds
</pre>
<p>As you can see, doing I/O does, in fact, release the GIL. The threaded examples are obviously getting faster than the single-threaded execution. Given that most applications do perform a certain amount of I/O (most applications spend a large amount of their time in I/O) the GIL does not prevent users from creating multi-threaded applications that can act in a concurrent manner and add speed to their applications.</p>
<p>Does the GIL block those people who are working in pure Python from truly taking advantage of multiple cores? Simply put: Yes, it does.  While threads themselves are a language construct, the interpreter is the gatekeeper to the mapping between threads and the OS. This is why Jython and IronPython have no GIL.  It was simply not needed and left out of the implementation of the interpreter.</p>
<p>Obviously, based on the numbers above, switching to processes side-steps the entire GIL issue, allowing all of the children to run concurrently.  It’s something to think about and it’s been pointed out quite a bit.</p>
<h3>In Summary</h3>
<p>Threaded programming is the concurrency solution for the “common” man, but the problems one runs into when delving into threaded programming are not for the faint of heart, and they are not easy to overcome.  Threading is the first solution to the concurrency problem that many people run to when they first get the urge to begin doing tasks in parallel.</p>
<p>Python itself has good threading support, including all of the locking primitives, queues, events and semaphores.  That’s everything Java and many other languages have, including some higher-level “cool” thread things.  Can CPython take advantage of multiple threads for concurrency?  Yes, with caveats. The caveats applied hamper a particular segment of application developers for sure, but for most of us working in high I/O environments, CPython’s thread system with the GIL works out fine.  Even in those environments though, threads may not be the fastest option.  </p>
<p>The GIL is something that’s been debated, and continues to be debated amongst many.  Some call it a feature, some call it a mistake. Some people will be quick to say that threaded programming is too difficult for a large swath of the population, and that, in and of itself is a true statement.</p>
<p>An important point to remember: The GIL is an interpreter issue. This means that, again, other interpreters, such as Jython and IronPython do not suffer the “penalty” of the GIL. In the same vein, there are a few people out there currently working with the Python code base to experiment with the removal of the GIL in the CPython interpreter.  </p>
<p>Guido (the BDFL) has already expressed openness to accepting a patch set to the CPython tree that could optionally enable or disable the GIL or, if some enterprising individual wanted to, to implement the interpreter in such a way as to remove the GIL entirely without sacrificing single threaded performance.  </p>
<p>There continues to be a large swath of people that state that threads are not a true solution to the concurrency problem, despite many hundreds of thousands of threaded applications currently in production. These people yearn for something cleaner, less prone to the dark and sordid synchronization problems too much shared-state brings to the table.</p>
<p>Threading is ubiquitous, but it is only a single solution for concurrency and we hope that this article helps you think about it’s pitfalls, potential and its state in the Python language.  </p>
<p>For more thoughts on the GIL, see Guido’s blog (read the comments): <a href="http://www.artima.com/weblogs/index.jsp?blogger=guido" target="_blank">http://www.artima.com/weblogs/index.jsp?blogger=guido</a></p>
<p>For another excellent comparison of threads/processes/etc., see the recent effbot posting on the Tim Bray wide finder project: <a href="http://effbot.org/zone/wide-finder.htm" target="_blank">http://effbot.org/zone/wide-finder.htm</a></p>
<p>Note, for a producer/consumer threading example, see the threading.py module’s _test() method.  </p>
<p class="wp-flattr-button"></p>]]></content:encoded>
			<wfw:commentRss>http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Object Caching 2251/2391 objects using disk: basic

Served from: jessenoller.com @ 2012-02-04 00:06:22 -->
