Trapped in python package; send food.

So, I (and many others) have lamented packaging issues in Python. Some people are focused on schrodingers-lolcat1.jpgintegrating with vendor systems (such as apt (.deb) and yum (rpm)) – while others are concerned with disutils/setuptools/etc.

Still others (like me, and maybe I’m alone) are trapped in a tween-state. We’re partially using vendor systems, and partially using self-compiled versions of python.

The cardinal “rule” has been not to “touch” the vendor-specific installations of python (this includes you, Linux). For example, on OS/X – any time you run easy_install or pip you install into the global site-packages directory. The same applies when you do the same on linux, and when you run apt-get install/yum-install. Things go into that global, shared directory.

This sucks. Here’s why:

  • Versions. Some applications depend on very specific versions of libraries. This is because the maintainers of the libraries they depend on are bad, and break backwards compatibility.
  • site-packages becomes a toilet. Before my near OCD levels of cleanliness, I checked my system’s site-packages directory – I think all told I had about 250 different .eggs/packages/modules/etc all littered in there. And .pth files, and half-exploded things with metadata directories. And I think I found a squirrel in there.
  • “globally” installing things like nose, pip and setuptools put the binary scripts in /usr, /usr/local and so on. This again causes those directories to become a toilet.
  • In some cases, upgrading something outside of your vendor packages – say, something pre-installed into RedHat’s python version can in fact, break and side-effect the system as a whole.

So, I guess you could say “system-level site-packages considered harmful”. Once I realized the horrible error of my ways, I switched to virtualenv/virtualenvwrapper. This works great for me. But at least on OS/X – something was lacking.

That something was dependencies needed to compile something like readline into python. I could install the readline egg from pypi and just “work around it”. Or I could install macports (which is broken in many ways) and install the readline development libraries in there.

Unfortunately, macports also side effects your system in undesirable ways. Suddenly you’re linking to things you don’t realize, you’ve got things compiled in you don’t need/want, and so on.

So, what’s a guy supposed to do?

Well, since I’m not afraid of compiling things, I built a mini-macports for myself. I made a directory (named “slash”) in my home directory, and compiled things like readline into it. I then point the python compiles to that directory and move on with my life (I love you, –prefix). After compiling/installing PIL, Readline, etc into this directory as well as a pile of python versions, and slapping virtualenv on top of it I was feeling pretty good. I get only what I need, and virtualenv keeps things out of the global directories.

Well. Minus the fact that it’s huge, non portable and it’s sort of a pain in the ass.

Then, I got an itch – I wanted to build a “python megapack” – I lovingly named it python-kitchensink. My goal was to repeat what I did above, and then offer it as a download for people who want to avoid this pain themselves on OS/X.

Easy enough. Minus one nit.

You can’t tar the damned thing up. I don’t know if it’s a side effect of disutils/setuptools, but scripts being installed into this root, were having the #! lines hard coded to the exact path of the interpreter. This means if you went through all this compilation, and then installed easy_install – and say you did this in “/Users/jesse/myslash” – easy_install would get “#!/Users/jesse/myslash/bin/python2.6″ hard coded into it.

Instead of kitchensink, I should have named it “jesse cusses a lot”.

So, back to square one. Or rather “think about this in the back of my mind, forget about it and then change to a new job”.

Forgetting about trying to do this for OS/X, I end up needing to do something eerily similar on Fedora Core. Now, compilation of python with all the bells and whistles on Fedora is simple – “yum install xxx-devel” and then just run the compile.

The goal was to make a fully-featured python 2.6 install on FC10, and then bootstrap the user(s) into a virtualenv so that nothing got plopped into the global directories.

Well – minus the fact fedora core 10 ships with python 2.5. And tools like virtualenv/etc from the yum repos lag behind the versions I want/need. Damnit. Do I stick to RPMs? Do I bootstrap it enough to “just work” and then pip install the rest? What about python2.6? Where are my pants?

There’s another catch: it has to work on *first boot* and there’s no network on that first boot.

So, forgetting my experiences with compiling all this stuff myself on OS/X, that’s what I do at first. I install all the devel packages, build an RPM which consumes a tarball I create, and add it to a local repo, and throw it in the kickstart file which spews out the images.

Oh but wait. The hardcoded #!’s come back and bite me in the ass. The build server compiles things in a temporary directory, and then installs easy_install and all of the other tools into the –prefix’ed python install. That temp directory is named something like “–TMPxx1341234DFLKJ1341234.xxx.hahaha”. Soooo, I get “#!/–TMPxx1341234DFLKJ1341234.xxx.hahaha/bin/python”. That’s about as useful as a beehive in my toilet.

Easy fix though: just make sure the buildserver doesn’t have anything in the eventual location of the installed version from the rpm (/opt/lazercats (ok, not really)) and just compile everything there.

Success, and win. Heck, I even get it to bootstrap virtualenvs for the users. Then I find out I’ve increased the image size by 40 or so megabytes. This immediately wipes the grin off my face and makes me realize I have again, failed. You see, I can’t freely increase the image size like that.

I need python 2.6. So, step one is to swap to fc11. Ok, good. I also want to avoid using the lag-behind vendors packages except for the bare minimum footprint I need to bootstrap the environment. This means modifying the kickstart packages list like this (note: I also can not install a compiler – which is needed for a lot of packages):

# Python utilities
# python-lxml is == 2mb
python-lxml
python-setuptools
python-crypto
python-paramiko
python-pycurl
# Needed for virtualenv < 1.0 mb
python-devel
python-setuptools-devel

Why on earth is python-devel needed for virtualenv? Why python-setuptools-devel? Whyyyy??!
Ok, so I'm only going to be stuck with upstream versions of lxml, setuptools (which hasn't revved since the earth cooled) and a few others. Fine.

I then jump into kickstart file and pop in:

%post --nochroot
cp python-dependencies.txt $INSTALL_ROOT/root/python-dependencies.txt
%post
%include post.txt
%end

In post.txt:

# Python environment setup

# Temporarily make DNS work
echo "nameserver 10.1.1.10" >/etc/resolv.conf

# Python environment setup
( cd /root
    /usr/bin/easy_install virtualenv
    /usr/bin/easy_install virtualenvwrapper
    /usr/bin/virtualenv /opt/thatthing
    /opt/foobar/bin/easy_install pip
    /opt/foobar/bin/pip -E /opt/thatthing install -r /root/python-dependencies.txt
    rm -rf build/ python-dependencies
    echo "export WORKON_HOME=/opt" >>/home/jnoller/.bash_profile
    echo "source /usr/bin/virtualenvwrapper_bashrc" >>/home/jnoller/.bash_profile
)
rm -f /etc/resolv.conf

# End Python setup

The python-dependencies.txt is a pip requirements file and looks like this:

# use pip install -r


# http://code.google.com/p/boto/
boto

# http://docs.fabfile.org/0.9/
fabric

# http://ipython.scipy.org/moin/
ipython

# http://tools.assembla.com/yolk
yolk

# http://code.google.com/p/httplib2/
httplib2

# http://ipaddr-py.googlecode.com

http://ipaddr-py.googlecode.com/files/ipaddr-1.1.1.tar.gz

Note, I can't also plop svn, hg, git, etc in here - so packages not on the cheeseshop in or packaged right are a no-go.

The trick here is that the %post commands in the kickstart environment run in a chroot of the OS being created. This means, once the new image is loaded (say, in EC2) I can ssh in, and hit "workon thatthing". In reality, the WORKON dir should be elsewhere, but I'm going to let users override that. As it is, the "one true python" version is the one in /opt - no one (even me) gets to touch the system version of python.

I now have a python environment, available on first boot, isolated from the OS-provided one. I can spawn infinitely more virtualenvs and play all day long. The few global things I have are easy_install and some libraries which I hope I don't need to rev myself.

I still haven't licked the OS/X part. I'm probably just going to have to compile the barest possible environment in something like /opt/python-ks and go from there. Given I'd need to compile all of the dependencies into it (such as readline) I may just end up writing a big script to grab all the bits and then compile it into a location the user provides. The nice thing is that once I bootstrap python and virtualenv into the basic tree, I can use pip bundles/requirements files to pull in the rest.

All told, I sit here looking at the mess I've slogged through - and then I realize the entire python-packaging discussion on python-dev just exposes a whole 'nother can of worms. Versioning in a single site-packages directory, how app developers conflict with OS vendors, etc. It's a mess. OS Vendors lag behind developer released versions, and come to depend on what's installed there (have you ever broken yum on a Fedora box? I have.).

I hope Tarek gets a chance to clean a lot of this up - and while I'm against "everything and the kitchen sink" in the stdlib - having some method/API of building out "an official-like" virtualenv setup (maybe making virtualenv's life easier) would be nice.

Edit to add: I realize that hardcoding the shebang line is desirable in many cases, the obvious reason is that you need to be pointed at the interpreter which has your dependencies/libraries in it. Not having a clear way of altering that behavior (other than a "clever" sed script) is unfortunate.

See this followup as well

  • kteague
    Twiddling with the shebang, that's a Distutils thing:

    http://docs.python.org/distutils/setupscript.ht...

    "The only clever feature is that if the first line of the script starts with #! and contains the word “python”, the Distutils will adjust the first line to refer to the current interpreter location." Heh, one man's "clever feature" is another man's headache. The --executable option allows you to override this behaviour though.

    But the other option is to declare a set of repeatable steps so that you can compile from source on fresh machines. Buildout is the uber-tool for this job! Here's a decent starter for compiling a Python interpreter using Buildout (http://bluedynamics.com/articles/jens/build-pyt...). And generally anything is hairy or a PITA to compile on OS X, you can Google for "buildout hard-to-build-thing" and someone has usually put together a recipe for building it.
  • Perhaps I'm missing something, but have you given buildout (http://www.buildout.org/) a shot? It's used to good effect in the Plone community, where it helps manage complex dependencies including compiled software, and works quite nicely once you've learned the basics of creating buildout.cfg files.
  • If you only need to use Python 2.6 (or better) you can use PEP370 instead of virtualenv:

    all I do is:
    PYTHONUSERBASE=$HOME/my-python
    pip.py install --install-option="--user" PythonPackage

    There's no copy of the interpreter, no setuptools required, the shebang is still "#!/usr/bin/python" everything is installed in $HOME/my-python/bin and $HOME/my-python/lib/python2.6/site-packages

    If I switch to python3(.1) I don't have to do anything special since my custom packages for 3.1 will be installed in $HOME/my-python/lib/python3.1/site-packages (except that the /bin might be conflicting).
  • I missed that in 370, I thought it only handled the site-packages bit - I'm surprised it handles the bin scripts properly - they should not be installed in /bin. I'll double check my stuff to check that, and the proper shebang line (cause so far, setuptools/distutils isn't playing nice like that.
  • Francesco
    Good post, Jesse. I feel your pain.

    Francesco
  • Giuseppe
    I am working on a software works like virtualenv but allows also the installation of non-python source packages. (http://pypi.python.org/pypi/bpt) Of course you can install your favorite version of python in it. It also includes a modified version of pip, so installing python packages is as easy as easy_install. For normal tarballs with configure/make it guesses the build commands like checkinstall does.

    You may want to have a look. These days I have no time to develop it actively, but I use it everyday (on Mac, but it works on linux as well) and in different situations (for example is part of the build system of an application I am working on, that for stability needs frozen versions of dependencies instead of the ones provided by the distribution). The nice thing is that the directory with all the installed files is relocatable: it can be moved (for example on another machine with the same architecture) and it still works.
  • Howdy Giuseppe - I actually looked at bpt - it looks like a good idea, but not quiet "ready enough". I think it's a good start. How did you trick setuptools/distutils not to hard code the shebang line so the packages are portable across machines/directories?

    Personally, from the little bit I poked at it, I'd probably want to patch it a bit, but (and this is a personal thing) it's GPLed, and I avoid patching GPL stuff for various reasons.
  • Giuseppe
    I agree it's not ready enough, unfortunately I have not much time to work on it and as it is now it is enough for my needs (except for automatic dependency resolution/downloading which I'd really like to implement as soon as possible). Besides there are some design choices that I'll probably change (for example use python scripts instead of bash for the bpt-rules files).

    Distutils/setuptools are a big problem, they do too much magic which is impossible to control. I don't avoid the shebang rewriting. The most robust solution I found for relocatable boxes is to build python inside the box. This way distutils will rewrite the shebang with an absolute path /tmp/sandbox_<...>/bin/python which is valid if the box is relocated (all bpt is based on this trick). I have used it successfully to distribute complete applications on a computing cluster without having root access (they were debian sarge with python 2.3!)

    What would you like to patch? You could make some suggestions instead and I could try to implement them, if they can be useful.
    About the license, I chose GPL just because of my ignorance about licenses, I thought that for python projects it is permissive enough for python software since it allows using the software as a library even for commercial applications. I would be very open to switch to more permissive licenses if there are good reasons to do that.
  • I went the bash script route too, given it's faster/easier then a series of check_calls from python. As for the boxes being portable - they're not, because of exactly what you mention - the hardcoded shebang line. If you move a box from /tmp/sandbox_<...>/bin/python to say, /home/jnoller/sandbox_<...>/bin/python - those hardcoded shebangs break. And I agree with compiling python *into* the sandbox, that's a trick a lot of us use to have many, not-system-wide installs of Python running.

    I'd have to resurrect my notes (I don't know what I did with them) on the patching, but things like dependency resolution, virtualenv support and swapping to python scripts comes to mind.

    As for licenses, that's really a matter of personal taste. I avoid it, and stick with Apache License 2.0 (compatible with the python software license) for most things. The more permissive licenses don't invoke additional clauses, meaning I can import bpt in my app, without making my app's license change (wherein the GPL would force the combined app to be GPL).

    The licensing thing was part of a debate recently, without digging too much into my own reasons, see:

    http://farmdev.com/thoughts/80/why-you-should-n...
    http://zedshaw.com/blog/2009-07-13.html
    http://www.b-list.org/weblog/2009/jul/14/licens...
    http://jacobian.org/writing/gpl-questions/
  • Giuseppe
    Bash is without a doubt the easiest way to do it, but I'd like to try a scons/waf approach instead of a series of check_calls, i.e. abstracting some commands (configure, make, etc...) to python declarations. This could result in more platform independency (maybe it could even support windows.). The new system would not necessarily replace the bash one: bpt is made to support different ways of installing packages inside a box, it just provides a virtual filesystem based on symlinks. "build", "autobuild", etc... are just commands implemented on top of it.

    The "virtual filesystem" /tmp/box_<...> trick works this way: when you run the env script, a symbolic link is created from your current box location to /tmp/box_<...> where <...> is an id unique to the box. So if the box is relocated the link is created to point to the current box directory (actually if you move the box you will have to remove the link by hand. Better safe than sorry).
    This trick makes python work if it is installed in the box, because /tmp/box_<...>/bin/python will point to the correct binary regardless where you have put your actual box.

    Thanks for the links about licenses, I'll have a look ASAP.
  • I like the way you think ;)

    Ugh to the hoops you have to jump through to symlink things. That's gross, I know why, but ugh.
  • Giuseppe
    I forgot: boostrapping is also kind of easy using the API. These are the two file I use on a (toy) project I am developing:

    http://pastebin.com/f71e884ba to boostrap a box and install python in it
    http://pastebin.com/fe1f06fa to install non-python dependencies (python dependencies use a pip requirements file).

    This is something I'd really like to automate, but I need to figure out some problems before starting to code...
  • Giuseppe
    Thanks :)

    I completely agree about the symlink. Actually if there are better solutions, the code that does that is limited to few lines that are easily changed. However, I could not think of anything better.

    I mean, on linux there would be FUSE, or some tricks with LD_PRELOAD like fakechroot does. But they would need FUSE/etc to be installed on the guest machine, while it is important for me to have as few dependencies as possible on that side. Besides, Mac compatibility is very important for me :) (I know, mac-fuse and everything, but it is a huge jump in complexity).
blog comments powered by Disqus