CheeseShopping: python-application, mglob and pickleshare

by jesse in ,


Well, it's been a bit since I've done one of these run throughs. One of the RSS feeds I watch (out of 120) is for the Python CheeseShop - this is where a lot of very interesting modules are uploaded by community authors, some of more interest than others. When looking at modules at the cheeseshop I always keep an eye towards code examples - finding a particularly interesting implementation of something (say, the debug/memory.py module in python-application) always helps me improve my code/applications/etc.

I like to check out (albeit briefly) and write down notes about modules of interest that I see - I have a backlog of around fourty modules I have notes on. This morning, I saw 3 that piqued my interest. (Note, I started writing this earlier this week - only just now finishing it.)

As a side note: many of these modules can be installed via easy_install.py - I tend not to randomly install modules (and pollute my path!), preferring instead to grab the tarball and poke around in a sandbox/workingenv.py style environment.

First up is python-application (v 1.0.9) which is, to quote:

This package is a collection of modules that are useful when building python applications. Their purpose is to eliminate the need to divert resources into implementing the small tasks that every application needs to do in order to run successfully and focus instead on the application logic itself.

I snagged this, and there are some excellent code examples/useful tidbits in the package - I don't know if I would use the entire thing in a given application - of particular note was this snipper from application/debug/memory.py:

import gc
def memory_dump():
    print "\nGARBAGE:"
    gc.collect()
    print "\nGARBAGE OBJECTS:"
    for x in gc.garbage:
        s = str(x)
        if len(s) > 80:
            s = s[:77] + '...'
        print "%s\n  %s" % (type(x), s)
gc.enable()
gc.collect() ## Ignore collectable garbage up to this point
gc.set_debug(gc.DEBUG_LEAK)

The module is documented well - all you have to do it import * from the module and then call memory_dump() later. The datatypes.py module in the configuration directory was also a very nice example. Also, the process.py module on the top level. I'd suggest taking a look at it just to learn more - everyone has their own mise en place or tool box so to speak - we all have our own little bits of code we carry from application to application. This package is an good example of simple, useful things that we always end up doing (and I learned a few tricks too).

As a side note, my toolbox code has evolved so rapidly and so drastically from when I first started hacking python, I sometimes wonder if one day, my basic if __name__ == "__main__": setup will gain sentience. I'm glad that most programmer like to share though - if we guarded our tools as closely as BBQ masters guard their rubs and sauces, we'd be in trouble.

Update to note: The author of both pickleshare and mglob added a comment to this post outlining some good information, as well as the fact both tools are/will be in IPython. Also, that mglob's syntax was intentional - "it’s optimized for brevity and convenience" (which is why I found it cryptic). For this particular tool (mglob) Path would not have helped him.

The next one is pickleshare (v0.3) quote:

PickleShare - a small 'shelve' like datastore with concurrency support Like shelve, a PickleShareDB object acts like a normal dictionary. Unlike shelve, many processes can access the database simultaneously. Changing a value in database is immediately visible to other processes accessing the same database. Concurrency is possible because the values are stored in separate files. Hence the "database" is a directory where all files are governed by PickleShare.

Another quote from the readme:

Version note: this is an early beta version of the module. It has been tested (and works) in both Linux and Windows. This will probably end up as the interactive persistence system for IPython 0.7.2+, to make inter-ipython-session data sharing possible in real time.

This is an interesting module - shared objects/dbs in a concurrent system run the risk of various deadlock issues/data syncing issues/etc. This module aims to bypass that with the simple file-based workaround. In my (admittedly small) testing it seems to get the job done just fine - the fact that the "database" is written to disk (and therefore accessible without the pickleshare module itself and maintained through app runs obviously) is quite nice.

Cracking open the pickleshare.py module itself showed some very interesting code (again, teaching more tricks) - for more enlightenment, read the test() method ((I've gotten into the habit of TDD/reading tests to determine functionality more and more lately learning Java)). This is a very interesting module, and the usage/class style: PickleShareDB(UserDict.DictMixin) was very useful.

I'd like to play with this module + the processing module.

And finally, mglob (v0.4), which is:

Usable as stand-alone utility (for xargs, backticks etc.), or as a globbing library for own python programs. Globbing the sys.argv is something that almost every Windows script has to perform manually, and this module is here to help with that task. Also Unix users will benefit from enhanced features such as recursion, exclusion, and directory omission.

I put this in my little ~/toolbox binary dir as soon as I started playing with it - as a command line utility, it's insanely useful (yes, I also know there are other tools out there like this). The command line syntax is sort of counter-intuitive at first (for example, I wanted to fine all of my mp3s): woot:~/Desktop/Downloads/tmp/mglob-0.4 jesse$ python mglob.py rec:/Users/=*.mp3 The syntax is rec: (recursive) /Users/ (directory to transverse) =*.mp3 (files to find). This is of course explained in the help functions of the script. Using this in a python application is also sort of cryptic, but matches the command line:

from mglob import expand
expand("rec:/Users/=*.mp3")

You get a full list back from the result of the glob - this would be a serious problem for massive-recursive globs (the option to use a generator to yield() would be nice). I like the command-line usage, but for pure code-globbing, things like Jason Orendorff's Path module and other alternatives just feel better API-wise ((Jason's site seems down, I put a copy of path.py I had here))