| Subcribe via RSS

How does pickle work?

June 18th, 2007 | | Posted in Programming, Python

Via Brett Cannon I had the pleasure of reading Alexandre Vassalotti's blog post: "Pickle: An Interesting Stack language". It's really an excellent read - I never quite grokked pickle (or it's brother, cPickle) the code in his post actually taught me about something else: the "code" module. To quote:

The code module provides facilities to implement read-eval-print loops in Python. Two classes and convenience functions are included which can be used to build applications which provide an interactive interpreter prompt.

So much to learn I have. It really is a great post - I've only ever used pickled for passing an object (or an object in a set state) from either one run of an application to the next (i.e: a cache) or from one machine to another (via multiple copy-like functions, yes I know about pyro)

The Dog Days of… OH GOD BEES.

June 18th, 2007 | | Posted in Personal

Yeah - two weeks since my last post. The little journal of goals I keep in textmate has two carry-over goals:

@personal
- Work out every night
- 1 blog post/day (real content)
- Count to 10 before replying to anyone, yourself included. (I kid!)

I have been the conductor on the failure train for both of these. The primary being the bees I referenced in the title, and by bees I mean "the last two weeks of a release". At work we've been grinding on a major upgrade to our existing product and the last few weeks have been that painful continuous repetition that comes with all major releases - test, find bug, fix bug, test, verify bug, check bug queue, repeat.

With all software releases, it's that last painful tradeoff of bug vs. release that tends to sting the most - do you delay the release and fix it now or do you bump it to a patch release late - and fixing it now means recursing back into test-ville.

I've had a weird analogy locked in my head lately - thinking of bugs as bees that want to sting you. One hurts for a minute, and then goes away - a lot at once just make you pass out. What really kills you is a batch of say, four big ones that tag you with some regularity. You just start to get angry and sullen and you want to punch the bees. In this case though, their bugs and you can't punch bugs.

I'll punch every bee in the face! -Dane cook

Besides the stress of the release (ah, but that stress is gone right? RIGHT?) the pregnancy of the Noller continues. These past few weeks have waxed between banal and "exciting" - and of course by exciting I mean "I have started researching blood pressure medication or bleeding as a technique for relief".

Mainly, my wonderful wife is a little under one month away from the "projected" due date. In reality, we've crossed over to "oh crap are you going into labor" territory. It's all very exciting - I think we're both ready to stop being "pregnant" and to start being "sleepless" instead. It would take a lot of worry out of our lives.

Well, with any luck - I should be a bit better this week - right now I've got around 150 tagged items in NetNewsWire to followup on, and a bunch of notes to trudge through.

Python’s import mechanism in pseudocode.

June 5th, 2007 | 3 Comments | Posted in Programming, Python

Brett Cannon's gone and made something incredibly useful - he mocked up the python import system in simple pseudo code here this is great as it actually answers some questions I had on sunday when I was boggling at some import-isms I was seeing. To quote:

At this point I have written out the code for how import goes about looking for a module. This covers the use of sys.meta_path, sys.path, sys.path_importer_cache, and sys.path_hooks. I have not covered how the bytecode/source dance works or extension modules. Writing up those two will show why Python makes so many stat calls when it does an import.

From Now you too can know how import (roughly) works!

I for one am looking forward to seeing the rest of this work.

As an aside: Did you know that module preference is in path load order? For example - say I I had /usr/lib/python2.4/foo.py and later I had /home/jesse/foo.py and sys.path.append'ed('/home/jesse') - the initial foo.py would be seen by the loader first, and the later version would not be loaded.

I knew this, but I didn't know1 it - the same thing happens when you "import os" and later "import os" - you don't reload the module, the original import is the one in effect.

In order to get modules loaded you know conflict with something in the stdlib (say, a working popen2) you should always sys.path.insert(0, ...).

  1. I'm blonde, I have moments []

PyMOTW: os (Part 2) - and popen2 isn’t thread safe.

June 3rd, 2007 | 2 Comments | Posted in Programming, Python

Doug's followed up last week's OS module post with another tidbit around pipe creation.

Personally, I would avoid os.popen* completely, and go with the latest subprocess module for pipe control. The subprocess module cleans up a lot of the rough edges around pipe/subprocess creation found in the os.popen and popen2 modules.

See Doug's post reminded me about something I wanted to blog about a little while ago - certain versions (up to 2.4.4) of subprocess and popen2.* are not threadsafe. See python bug 1183780 - "Popen4 wait() fails sporadically with threads". This chewed us up for awhile at work - we do a lot of heavily threaded popen2 spawning1 and we were seeing this error:

File "/usr/lib/python2.4/popen2.py", line 94, in wait
pid, sts = os.waitpid(self.pid, 0)
OSError: [Errno 10] No child processes

At an alarming rate. I pulled down the latest python subversion tree and tracked down the change list/version of popen2 that had the fix for us and with some site.addsitedir() magic2we now replace the 2.4.x broken version with our own patched version that works.

This is something a lot of people should watch out for - especially linux users that are on older versions of 2.4. If you think you are hitting this - just sync the python subversion, grab the updated version for your python version and either overwrite the one the stdlib, or make sure it's symbol loads last in sys.path via sitecustomize.py or sys.path.insert(0, 'yourpathhere')

Edit to add: Just so everyone can see an example - this experiment was done using Fedora Core 6's default python installation - I can download the example script and exacerbate the problem like this:

[jesse@lol ~]# python
Python 2.4.4 (#1, Oct 23 2006, 13:58:00)
[GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
[jesse@lol ~]# python popen_bug.py -n 100
Exception in thread Thread-11:
Traceback (most recent call last):
File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap
self.run()
File "popen_bug.py", line 43, in run
pipe.wait()
File "/usr/lib/python2.4/popen2.py", line 94, in wait
pid, sts = os.waitpid(self.pid, 0)
OSError: [Errno 10] No child processes

I've tested the patched version up to 1000 threads.

  1. we haven't had time to switch to subprocess - ah, legacy code []
  2. by the way, the site modules documentation in the library reference sucks. []

Python Cookbook : LRU cache decorator

June 3rd, 2007 | | Posted in Programming, Python

Python Cookbook : LRU cache decorator

Looks interesting - from the description:

One-line decorator call adds caching to functions with hashable arguments and no keyword arguments. When the maximum size is reached, the least recently used entry is discarded -- appropriate for long-running processes which cannot allow caches to grow without bound. Includes built-in performance instrumentation.

(Via overview by linuxer (on programming.reddit.com).)

Schrödinger’s Cat In A Box

June 2nd, 2007 | | Posted in Uncategorized

Schrodinger-lolcat

Via Joey Devilla.

I have to admit, I laughed.

Schrödinger’s Type (is a namespace a box?)

June 1st, 2007 | 3 Comments | Posted in Programming, Python

I chose the title for today's foray into Duck/Latent typing from the (in)famous thought experiment Schrödinger's Cat wherein a cat is placed in a box with a radioactive isotope, and one can not observe the state (type) of the cat without irrevocably altering said state - I felt it was a dutifully ironic view of the lack of "contractual enforcement" of types except during runtime within Python.

My "type(Duck)’ing: On Duck vs. Static Typing" post received some minor attention, but as with all things of an obsessive nature, I wanted to followup and delve more into the Python(ic) aspects of typing as I understand them1.

Henry Story - the author of the article which triggered my last post made several comments - the last of which stuck out the most for me:

...snip "But in fact these languages don’t do this. they just look to see that there is a method that is named “Quack”. If there is it gets called as if it was clear that was it means was the sound. But why could a dog not have a “quack” method that meant to kill a cat? There would be no interface broken here. Just a clash of words, and we have those a lot in the english language. Things such as “bank” (the place you deposit your money at) and “bank” of a river… We disambiguate english because we take context into account, just as a programmer makes sure that when he gives objects to a method that will duck type on “quack” he makes sure never to give the method objects where “quack” means something else. Notice how this has trouble scaling though.

Anyway, I would be glad to be shown to be wrong. I just looked up a book on Ruby, and that confirmed my thinking. Do you have a pointer to a piece of code that could resolve the issue?"

As others pointed out semantics and context matter when programming. There is simply no way around context being relevant - much like language2 context and intent give relevance and inflection. Yes - a word can mean many different things - hell, some people make stuff up.


Again ignoring the fact that Henry is trying to address the semantic web with the concept of a URI/heirarchy based "typing enforcement scheme" I figured I would delve into, well, Python's faculties in this area (hint: namespaces and usage give context, and ergo you can derive type). But first a digression into namespaces.

I could speak to a person and say "How was your Day". In their crazy moon-speak, maybe they redefined "Day" to mean "Underpants". So, I just asked them how their underpants are doing - while amusing, in most places I would get slapped or fired. This is how internet flamewars break out - someone has a broken understanding of the word/world - or they don't have context and/or relevance. Who is in fundamental violation of the contract of language? Me or the person who redefined the interface (word)?

Welcome to Static Typing - because in the static world we do not trust that other programmers, people, scripts, applications or for that matter - ourselves - understand intent or context3.

Duck typing allows for fudge -4 sure, crazy McCrazerton thinks that Day means pants, Duck means Dog and all sorts of wrong things - but what if his definitions weren't explicitly "wrong"? What if they were "close enough".

By "close enough" I mean that given the context of what I was saying - he could figure out I wasn't addressing his private parts - but rather I was inquiring as to his current state. Sure! His word has all sorts of weight behind it. When he says "Day" or "Dog" - the word can bring all sorts of interesting interfaces with it - but who cares what he tacked onto the damned word: to me, if it's "Day" - then I can at least have an idea of what the hell is going on.

If it walks like a duck: Quacks like a duck, has wings, then it could be a Dog in a Duck costume. Who cares?

In our case - the Type in the Cat, the Namespace is the Box, and the isotope in the box is the inferred context of the type when we "open the box".

Code time! Yay!

At Python's very core is the concept of namespace. That's an important thing as everything within Python is a namespace and has scope and "privacy5" inherent in that design.

For instance - when you perform an import, the package loader walks into the target package's namespace and works on finding the target - for example:

import animal.species.dog.quack

This means that the interpreter walks into animal, then species and dog and snags the quack.py module contained there. It then compiles the associated python code to byte-code and now has something within the global scope pointing to the imported module. For example given this module "structure":

woot:~/tmp/ jesse$ mkdir -p animal/species/dog
woot:~/tmp/ jesse$ cd animal/
woot:~/tmp/animal jesse$ touch __init__.py
woot:~/tmp/animal jesse$ cd species/
woot:~/tmp/animal/species jesse$ touch __init__.py
woot:~/tmp/animal/species jesse$ cd dog/
woot:~/tmp/animal/species/dog jesse$ touch __init__.py
woot:~/tmp/animal/species/dog jesse$ touch quack.py
woot:~/tmp/animal/species/dog jesse$ vi quack.py

And within quack.py, all I put is:

 
def quack():
	return "Woof"

I pop into the interpreter and do this:

woot:~/tmp/duck jesse$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import animal.species.dog.quack
>>> globals()
{'__builtins__': , '__name__': '__main__', '__doc__': None, 'animal': }
>>> quack
Traceback (most recent call last):
File "", line 1, in
NameError: name 'quack' is not defined
>>> animal.species.dog.quack

>>> animal.species.dog.quack()
Traceback (most recent call last):
File "", line 1, in
TypeError: 'module' object is not callable
>>> animal.species.dog.quack.quack()
'Woof'
>>>

As you can see: I told the interpreter to import6 the full "path"/name of the module from dog I wanted, quack. Of course you can do things like this:

>>> from animal.species.dog.quack import quack
>>> quack()
'Woof'
>>>

But given that import statements are in the first few lines of your script7, or immediately near the code which is referenced, the scope and intent of quack within this session/application is clear. Most of the time people will pull in the top-level package, or the package right above the foo.py they wish to reference, in my case - it would be dog. I would then reference dog.quack.quack() when I needed to call the quack method in quack.py.

Now, import tricks, relative imports, etc are all interesting - and by far they give some of the best context that you could want about the initial intent of a method, class or object - but import renaming is another fantastic thing:

>>> from animal.species.dog.quack import quack as dogQuack

I've just saved myself some heart burn. I don't like calling foo.bar.baz.yourmom - I like calling function() or method() without the import/namespace foreplay. But I have the decency to rename the import to dogQuack so that when I reference it, I am always reminded about whose quack I'm quacking8.

You should read the import section in PEP8. No, seriously.

So now that we've gone and rambled on about the context of where a Class/Object or Method might come from, or be inferred we can move on to real things, like types!

Guido van Rossum: In Python, you have an argument passed to a method. You don't know what your argument is. You're assuming that it supports the readline method, so you call readline. Now, it could be that the object doesn't support the readline method.

Bill Venners: And then I'll get an exception.

Guido van Rossum: You'll get an exception, which is probably OK. If this is a mainline piece of code and something could possibly be passed to you that doesn't have a readline method, you'll discover that early on during testing. Just as much as in a typed language when you have an interface and you know you're getting something that has the right interface but doesn't implement the right thing, or it throws an unexpected exception. You'll hopefully find that during testing.

In addition in Python, because there aren't fixed protocols, something else can be passed that also supports readline and doesn't happen to be a file, but does exactly what you need. All you need at that point is something that returns lines.

This quote hints at one of the key things about Python: Exceptions should not pass silently. More on that later.

Python has some basic object types - you know, your run of the mill int(), float(), str(). One of the first things you learn in python is what each one means. For the sake of this discussion, we'll focus on dict() (dictionary) - Python's hash/map type.

If we have an object - let's say baz and we declare it to be a dict - We can politely inquire as to it's type:

>>> baz = {}
>>> type(baz)

>>>

The builtin type() method returns an object representing the type of the object you passed to it. Not the object itself, also not the string representation of the object's type. If we wanted to be banal about this, we could do this:

>>> baz = {} # assume someone passed us baz
>>> foo = {} # we make foo into an empty dict...
>>> type(baz) == type(foo)
True
>>>

And yes, this sort of check works if one actually has something in it:

>>> foo = {'hi':'mom'}
>>> type(baz) == type(foo) # yes I know, isinstance() - I'll get to that.
True
>>>

But if we compare the two objects directly (instead of the type objects returned by type()) we see that they are different:

>>> foo == baz
False
>>>

So - back to the topic of ducks - what makes our dict quack()? The methods it supports. Again - if it looks like a dict, smells like a dict - then it works like a dict it is a dict. Right9?

Let's look at the methods a dict has, shall we?

>>> foo = {}
>>> dir(foo)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__str__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
>>>

Now, I want to make my own object - say, MyDuck10:

>>> class MyDuck(dict):
... def __init__(self):
... pass
...
>>> dir(MyDuck)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__dict__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__str__', '__weakref__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

Behold: MyDuck supports all of the same attributes of dict() - and this brings us full circle.

If I create a Quack object in animal.species.duck or animal.species.duck.quack - then you can be sure that what you are looking at is a duck's Quack - you have the context - you can look at the attributes of the Quack() to ensure that is in fact, Quack.quack-able.

If some genius makes a animal.species.dog Quack object - the if you want the duck's quack - why are looking at a dog to supply your much needed quack? Maybe you want the special dog quack that uses the ducks's quack:

>>> class MyDog(MyDuck):
... def __init__(self):
... pass
...
>>> dir(MyDog)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__dict__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__str__', '__weakref__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
>>>

We now have MyDog, which has all of the attributes of MyDuck, and all of the attributes of a Dict object. This is fantastic as we can remove/override/extend things from MyDuck, or Dict inside of MyDog and move on with our lives.

What if the methods clash you say? What if both MyDuck and MyDog both have the .quack() interface, but in the case of MyDog, a woof is returned instead of a quack? The answer: Why are you using Dog's quack when you know it returns something you don't want, i.e: a ducks quack?

But what if you don't know that MyDog's quack is different than MyDuck's11? Easy - when you foolishly pass MyDog() into your script's makeQuackingNoises() function, and you look for the returned 'quack' value - you're going to get an exception - and exceptions should not pass silently in the night.

But you say: A compiler would have told me this ahead of time! To which my reply would be: A compiler protects you from the most base version of human error - the typo. Whether the typo is intentional (you changed the method without watching what someone was passing you) or unintentional (you meant Dog but typed Duck), the compiler can't protect you from more serious "pilot errors" (you pass in something that works, sort of). Ergo - Testing!


I know Python is not perfect12 and as I have said before, there is something nice about static contractual enforcement - that's why I like python 3000's ABCs (or even the Roles implementation).

Also, yes - method signature enforcement can get confusing sometimes - I had to chase down a bug I checked in yesterday that was the direct result of me playing fast and loose with the rules around method arguments13.

Note that I'd also like to point out Collin Winter's typecheck module you can also add in.

Going back to the original points however - Duck typing is flexible, powerful and yes, like all things involving those - it can be dangerous but namespace provide your context, and the inquisition of objects at runtime is easy to do, you can enforce the required contractual obligations as much as you need or want to.

Python's power comes from it's flexibility - and believe it or not, duck typing. Without it, we would lack some of the grace and power that comes with the language. A dynamic language should be exactly that: Dynamic. Static type enforcement/interfaces has benefits but it only protects you against simple bugs that a compiler can be smart enough to test.

For more thoughts/information on this - also see isinstance() considered harmful. As well as the links in my previous post.

  1. Disclaimer: I am not always correct, I too am constantly learning and adapting. What may seem clear now may change as technology changes or as I learn more []
  2. If you want more on Language vs. Programming, speak to "r0ml" Lefkowitz []
  3. I have a feeling this is where the verbosity critique of Java comes into play []
  4. with the implementation of ABCs, you can make the fudge more solid []
  5. see __methodName []
  6. see also: Importing Python modules []
  7. see PEP 8 []
  8. duck analogy: officially beaten to death []
  9. see Library Reference for Mapping Types []
  10. yes, I like BouncyCase classes and methods, leave me alone. []
  11. why are you coding? []
  12. see: python pitfalls, python warts, python gotchas []
  13. See: Method signature checking decorators []