Chewing on Import

by jesse in ,


Brett Cannon's recent posts on rethinking import Here and Here got me thinking about a package I saw a few weeks ago called URLImport. Looking at the about, he mentions PEP 302 which covers (his words):

Basically, python supports what is called a path hook, which enables you to hook a specific path item to an import handler of your choice (an url importer in this case). The PEP mentioned also gives details on the importer protocol, a protocol which all importers must comply to (by defining find_module() and load_module(), among other details).

Why is this interesting? Well, the two are really unrelated: I've just been pondering the work at Archivas I have been doing recently, basically, building out a Python library tests and frameworks can hook into - nothing special in the grand scheme of things, it's a bunch of code which has to be shared.

Internally, like all companies, we have an SCM (Source Code Management) application, and we build product builds rapidly and frequently as the code changes (again, nothing new) - part of the build system is running disutils to build up a tarball of the shared package for consumption.

The problem is a chicken and egg problem: While our builds and code is distributed to large clusters as part of a normal product deployment, the shared library is meant and targetted for the clients - not the actual nodes. This means that we have to go "out of band" to go out, grab the distutils package and install it on the test clients.

I plugged that into the framework, and it works well: when you have many branches, with disparate code on each branch for the shared framework, you should always be installing the library for that particular product build branch.

This works "good enough" - test clients are always guaranteed to have the latest drop of the shared library which is directly tied to the build of the product. But Brett's posts and the URLImport package got me pondering: Is there a Better Way?

Well - there could be. If you can design a custom import handler, there's no reason why you can not build a custom import handler that ties directly to your SCM (please don't hit me) - this way tests and applications can integrate tightly with your environment, and instead of having a pull-to-client system:

... download latest build, verify, install from foo.bar import baz

You could have:

try: import sys sys.path += ['scm://head/branch/dir'] from foo.bar import baz except: ... download latest build, verify, install from foo.bar import baz

This way, the first attempt to import everything comes from the SCM - the second would only occur if that import failed (which allows the program to be portable to an extent - you can always fall back to local import). The serious drawback? Anytime your code branches (assume the test code branches with it) your code has to change - or it has to be coded to take this into account from the get-go. (i.e: assume the caller is passing in the branch).

I guess the benefit to this is that programs will always have the latest version of the dependencies/libraries they need, with little overhead inside the calling application. Changes to shared libraries can go live the second they are checked into the SCM, rather than waiting for a full build-test-publish loop.

It's not groundbreaking - but just a thought. Of course, the QA guy inside of me cringes with this, given you'd be taking potentially untested code, of course, we're talking about libraries and modules that is the actual test code.

SCM Python modules: http://trentm.com/projects/px/ http://subversion.tigris.org/ (subversion comes with python bindings)

It's all navel-gazing. For now I'll stick to wrapping my head around the various vargaries of import.