Trapped in python package; send food.

July 17th, 2009 § 14 comments

So, I (and many oth­ers) have lamented pack­ag­ing issues in Python. Some peo­ple are focused on schrodingers-lolcat1.jpginte­grat­ing with ven­dor sys­tems (such as apt (.deb) and yum (rpm)) — while oth­ers are con­cerned with disutils/setuptools/etc.

Still oth­ers (like me, and maybe I’m alone) are trapped in a tween-state. We’re par­tially using ven­dor sys­tems, and par­tially using self-compiled ver­sions of python.

The car­di­nal “rule” has been not to “touch” the vendor-specific instal­la­tions of python (this includes you, Linux). For exam­ple, on OS/X — any time you run easy_install or pip you install into the global site-packages direc­tory. The same applies when you do the same on linux, and when you run apt-get install/yum-install. Things go into that global, shared directory.

This sucks. Here’s why:

  • Ver­sions. Some appli­ca­tions depend on very spe­cific ver­sions of libraries. This is because the main­tain­ers of the libraries they depend on are bad, and break back­wards compatibility.
  • site-packages becomes a toi­let. Before my near OCD lev­els of clean­li­ness, I checked my system’s site-packages direc­tory — I think all told I had about 250 dif­fer­ent .eggs/packages/modules/etc all lit­tered in there. And .pth files, and half-exploded things with meta­data direc­to­ries. And I think I found a squir­rel in there.
  • “glob­ally” installing things like nose, pip and setup­tools put the binary scripts in /usr, /usr/local and so on. This again causes those direc­to­ries to become a toilet.
  • In some cases, upgrad­ing some­thing out­side of your ven­dor pack­ages — say, some­thing pre-installed into RedHat’s python ver­sion can in fact, break and side-effect the sys­tem as a whole.

So, I guess you could say “system-level site-packages con­sid­ered harm­ful”. Once I real­ized the hor­ri­ble error of my ways, I switched to vir­tualenv/vir­tualen­vwrap­per. This works great for me. But at least on OS/X — some­thing was lacking.

That some­thing was depen­den­cies needed to com­pile some­thing like read­line into python. I could install the read­line egg from pypi and just “work around it”. Or I could install mac­ports (which is bro­ken in many ways) and install the read­line devel­op­ment libraries in there.

Unfor­tu­nately, mac­ports also side effects your sys­tem in unde­sir­able ways. Sud­denly you’re link­ing to things you don’t real­ize, you’ve got things com­piled in you don’t need/want, and so on.

So, what’s a guy sup­posed to do?

Well, since I’m not afraid of com­pil­ing things, I built a mini-macports for myself. I made a direc­tory (named “slash”) in my home direc­tory, and com­piled things like read­line into it. I then point the python com­piles to that direc­tory and move on with my life (I love you, –pre­fix). After compiling/installing PIL, Read­line, etc into this direc­tory as well as a pile of python ver­sions, and slap­ping vir­tualenv on top of it I was feel­ing pretty good. I get only what I need, and vir­tualenv keeps things out of the global directories.

Well. Minus the fact that it’s huge, non portable and it’s sort of a pain in the ass.

Then, I got an itch — I wanted to build a “python mega­pack” — I lov­ingly named it python-kitchensink. My goal was to repeat what I did above, and then offer it as a down­load for peo­ple who want to avoid this pain them­selves on OS/X.

Easy enough. Minus one nit.

You can’t tar the damned thing up. I don’t know if it’s a side effect of disutils/setuptools, but scripts being installed into this root, were hav­ing the #! lines hard coded to the exact path of the inter­preter. This means if you went through all this com­pi­la­tion, and then installed easy_install — and say you did this in “/Users/jesse/myslash” — easy_install would get “#!/Users/jesse/myslash/bin/python2.6″ hard coded into it.

Instead of kitchensink, I should have named it “jesse cusses a lot”.

So, back to square one. Or rather “think about this in the back of my mind, for­get about it and then change to a new job”.

For­get­ting about try­ing to do this for OS/X, I end up need­ing to do some­thing eerily sim­i­lar on Fedora Core. Now, com­pi­la­tion of python with all the bells and whis­tles on Fedora is sim­ple — “yum install xxx-devel” and then just run the compile.

The goal was to make a fully-featured python 2.6 install on FC10, and then boot­strap the user(s) into a vir­tualenv so that noth­ing got plopped into the global directories.

Well — minus the fact fedora core 10 ships with python 2.5. And tools like virtualenv/etc from the yum repos lag behind the ver­sions I want/need. Damnit. Do I stick to RPMs? Do I boot­strap it enough to “just work” and then pip install the rest? What about python2.6? Where are my pants?

There’s another catch: it has to work on *first boot* and there’s no net­work on that first boot.

So, for­get­ting my expe­ri­ences with com­pil­ing all this stuff myself on OS/X, that’s what I do at first. I install all the devel pack­ages, build an RPM which con­sumes a tar­ball I cre­ate, and add it to a local repo, and throw it in the kick­start file which spews out the images.

Oh but wait. The hard­coded #!‘s come back and bite me in the ass. The build server com­piles things in a tem­po­rary direc­tory, and then installs easy_install and all of the other tools into the –prefix’ed python install. That temp direc­tory is named some­thing like “–TMPxx1341234DFLKJ1341234.xxx.hahaha”. Soooo, I get “#!/–TMPxx1341234DFLKJ1341234.xxx.hahaha/bin/python”. That’s about as use­ful as a bee­hive in my toilet.

Easy fix though: just make sure the build­server doesn’t have any­thing in the even­tual loca­tion of the installed ver­sion from the rpm (/opt/lazercats (ok, not really)) and just com­pile every­thing there.

Suc­cess, and win. Heck, I even get it to boot­strap vir­tualenvs for the users. Then I find out I’ve increased the image size by 40 or so megabytes. This imme­di­ately wipes the grin off my face and makes me real­ize I have again, failed. You see, I can’t freely increase the image size like that.

I need python 2.6. So, step one is to swap to fc11. Ok, good. I also want to avoid using the lag-behind ven­dors pack­ages except for the bare min­i­mum foot­print I need to boot­strap the envi­ron­ment. This means mod­i­fy­ing the kick­start pack­ages list like this (note: I also can not install a com­piler — which is needed for a lot of packages):

# Python utilities
# python-lxml is == 2mb
python-lxml
python-setuptools
python-crypto
python-paramiko
python-pycurl
# Needed for virtualenv < 1.0 mb
python-devel
python-setuptools-devel

Why on earth is python-devel needed for vir­tualenv? Why python-setuptools-devel? Whyyyy??!
Ok, so I’m only going to be stuck with upstream ver­sions of lxml, setup­tools (which hasn’t revved since the earth cooled) and a few oth­ers. Fine.

I then jump into kick­start file and pop in:

%post --nochroot
cp python-dependencies.txt $INSTALL_ROOT/root/python-dependencies.txt
%post
%include post.txt
%end

In post.txt:

# Python environment setup

# Temporarily make DNS work
echo "nameserver 10.1.1.10" >/etc/resolv.conf

# Python environment setup
( cd /root
    /usr/bin/easy_install virtualenv
    /usr/bin/easy_install virtualenvwrapper
    /usr/bin/virtualenv /opt/thatthing
    /opt/foobar/bin/easy_install pip
    /opt/foobar/bin/pip -E /opt/thatthing install -r /root/python-dependencies.txt
    rm -rf build/ python-dependencies
    echo "export WORKON_HOME=/opt" >>/home/jnoller/.bash_profile
    echo "source /usr/bin/virtualenvwrapper_bashrc" >>/home/jnoller/.bash_profile
)
rm -f /etc/resolv.conf

# End Python setup

The python-dependencies.txt is a pip require­ments file and looks like this:

# use pip install -r


# http://code.google.com/p/boto/
boto

# http://docs.fabfile.org/0.9/
fabric

# http://ipython.scipy.org/moin/
ipython

# http://tools.assembla.com/yolk
yolk

# http://code.google.com/p/httplib2/
httplib2

# http://ipaddr-py.googlecode.com

http://ipaddr-py.googlecode.com/files/ipaddr-1.1.1.tar.gz

Note, I can’t also plop svn, hg, git, etc in here — so pack­ages not on the cheese­shop in or pack­aged right are a no-go.

The trick here is that the %post com­mands in the kick­start envi­ron­ment run in a chroot of the OS being cre­ated. This means, once the new image is loaded (say, in EC2) I can ssh in, and hit “workon thatthing”. In real­ity, the WORKON dir should be else­where, but I’m going to let users over­ride that. As it is, the “one true python” ver­sion is the one in /opt — no one (even me) gets to touch the sys­tem ver­sion of python.

I now have a python envi­ron­ment, avail­able on first boot, iso­lated from the OS-provided one. I can spawn infi­nitely more vir­tualenvs and play all day long. The few global things I have are easy_install and some libraries which I hope I don’t need to rev myself.

I still haven’t licked the OS/X part. I’m prob­a­bly just going to have to com­pile the barest pos­si­ble envi­ron­ment in some­thing like /opt/python-ks and go from there. Given I’d need to com­pile all of the depen­den­cies into it (such as read­line) I may just end up writ­ing a big script to grab all the bits and then com­pile it into a loca­tion the user pro­vides. The nice thing is that once I boot­strap python and vir­tualenv into the basic tree, I can use pip bundles/requirements files to pull in the rest.

All told, I sit here look­ing at the mess I’ve slogged through — and then I real­ize the entire python-packaging dis­cus­sion on python-dev just exposes a whole ‘nother can of worms. Ver­sion­ing in a sin­gle site-packages direc­tory, how app devel­op­ers con­flict with OS ven­dors, etc. It’s a mess. OS Ven­dors lag behind devel­oper released ver­sions, and come to depend on what’s installed there (have you ever bro­ken yum on a Fedora box? I have.).

I hope Tarek gets a chance to clean a lot of this up — and while I’m against “every­thing and the kitchen sink” in the stdlib — hav­ing some method/API of build­ing out “an official-like” vir­tualenv setup (maybe mak­ing virtualenv’s life eas­ier) would be nice.

Edit to add: I real­ize that hard­cod­ing the she­bang line is desir­able in many cases, the obvi­ous rea­son is that you need to be pointed at the inter­preter which has your dependencies/libraries in it. Not hav­ing a clear way of alter­ing that behav­ior (other than a “clever” sed script) is unfortunate.

See this fol­lowup as well

  • Giuseppe

    I am work­ing on a soft­ware works like vir­tualenv but allows also the instal­la­tion of non-python source pack­ages. (http://pypi.python.org/pypi/bpt) Of course you can install your favorite ver­sion of python in it. It also includes a mod­i­fied ver­sion of pip, so installing python pack­ages is as easy as easy_install. For nor­mal tar­balls with configure/make it guesses the build com­mands like check­in­stall does.

    You may want to have a look. These days I have no time to develop it actively, but I use it every­day (on Mac, but it works on linux as well) and in dif­fer­ent sit­u­a­tions (for exam­ple is part of the build sys­tem of an appli­ca­tion I am work­ing on, that for sta­bil­ity needs frozen ver­sions of depen­den­cies instead of the ones pro­vided by the dis­tri­b­u­tion). The nice thing is that the direc­tory with all the installed files is relo­cat­able: it can be moved (for exam­ple on another machine with the same archi­tec­ture) and it still works.

  • Francesco

    Good post, Jesse. I feel your pain.

    Francesco

  • http://jessenoller.com jnoller

    Howdy Giuseppe — I actu­ally looked at bpt — it looks like a good idea, but not quiet “ready enough”. I think it’s a good start. How did you trick setuptools/distutils not to hard code the she­bang line so the pack­ages are portable across machines/directories?

    Per­son­ally, from the lit­tle bit I poked at it, I’d prob­a­bly want to patch it a bit, but (and this is a per­sonal thing) it’s GPLed, and I avoid patch­ing GPL stuff for var­i­ous reasons.

  • Giuseppe

    I agree it’s not ready enough, unfor­tu­nately I have not much time to work on it and as it is now it is enough for my needs (except for auto­matic depen­dency resolution/downloading which I’d really like to imple­ment as soon as pos­si­ble). Besides there are some design choices that I’ll prob­a­bly change (for exam­ple use python scripts instead of bash for the bpt-rules files).

    Distutils/setuptools are a big prob­lem, they do too much magic which is impos­si­ble to con­trol. I don’t avoid the she­bang rewrit­ing. The most robust solu­tion I found for relo­cat­able boxes is to build python inside the box. This way dis­tu­tils will rewrite the she­bang with an absolute path /tmp/sandbox_<…>/bin/python which is valid if the box is relo­cated (all bpt is based on this trick). I have used it suc­cess­fully to dis­trib­ute com­plete appli­ca­tions on a com­put­ing clus­ter with­out hav­ing root access (they were debian sarge with python 2.3!)

    What would you like to patch? You could make some sug­ges­tions instead and I could try to imple­ment them, if they can be use­ful.
    About the license, I chose GPL just because of my igno­rance about licenses, I thought that for python projects it is per­mis­sive enough for python soft­ware since it allows using the soft­ware as a library even for com­mer­cial appli­ca­tions. I would be very open to switch to more per­mis­sive licenses if there are good rea­sons to do that.

  • http://jessenoller.com jnoller

    I went the bash script route too, given it’s faster/easier then a series of check_calls from python. As for the boxes being portable — they’re not, because of exactly what you men­tion — the hard­coded she­bang line. If you move a box from /tmp/sandbox_<…>/bin/python to say, /home/jnoller/sandbox_<…>/bin/python — those hard­coded she­bangs break. And I agree with com­pil­ing python *into* the sand­box, that’s a trick a lot of us use to have many, not-system-wide installs of Python running.

    I’d have to res­ur­rect my notes (I don’t know what I did with them) on the patch­ing, but things like depen­dency res­o­lu­tion, vir­tualenv sup­port and swap­ping to python scripts comes to mind.

    As for licenses, that’s really a mat­ter of per­sonal taste. I avoid it, and stick with Apache License 2.0 (com­pat­i­ble with the python soft­ware license) for most things. The more per­mis­sive licenses don’t invoke addi­tional clauses, mean­ing I can import bpt in my app, with­out mak­ing my app’s license change (wherein the GPL would force the com­bined app to be GPL).

    The licens­ing thing was part of a debate recently, with­out dig­ging too much into my own rea­sons, see:

    http://farmdev.com/thoughts/80/why-you-should-n…
    http://zedshaw.com/blog/2009–07-13.html
    http://www.b-list.org/weblog/2009/jul/14/licens…
    http://jacobian.org/writing/gpl-questions/

  • Giuseppe

    Bash is with­out a doubt the eas­i­est way to do it, but I’d like to try a scons/waf approach instead of a series of check_calls, i.e. abstract­ing some com­mands (con­fig­ure, make, etc…) to python dec­la­ra­tions. This could result in more plat­form inde­pen­dency (maybe it could even sup­port win­dows.). The new sys­tem would not nec­es­sar­ily replace the bash one: bpt is made to sup­port dif­fer­ent ways of installing pack­ages inside a box, it just pro­vides a vir­tual filesys­tem based on sym­links. “build”, “auto­build”, etc… are just com­mands imple­mented on top of it.

    The “vir­tual filesys­tem” /tmp/box_<…> trick works this way: when you run the env script, a sym­bolic link is cre­ated from your cur­rent box loca­tion to /tmp/box_<…> where <…> is an id unique to the box. So if the box is relo­cated the link is cre­ated to point to the cur­rent box direc­tory (actu­ally if you move the box you will have to remove the link by hand. Bet­ter safe than sorry).
    This trick makes python work if it is installed in the box, because /tmp/box_<…>/bin/python will point to the cor­rect binary regard­less where you have put your actual box.

    Thanks for the links about licenses, I’ll have a look ASAP.

  • http://jessenoller.com jnoller

    I like the way you think ;)

    Ugh to the hoops you have to jump through to sym­link things. That’s gross, I know why, but ugh.

  • Giuseppe

    Thanks :)

    I com­pletely agree about the sym­link. Actu­ally if there are bet­ter solu­tions, the code that does that is lim­ited to few lines that are eas­ily changed. How­ever, I could not think of any­thing better.

    I mean, on linux there would be FUSE, or some tricks with LD_PRELOAD like fakech­root does. But they would need FUSE/etc to be installed on the guest machine, while it is impor­tant for me to have as few depen­den­cies as pos­si­ble on that side. Besides, Mac com­pat­i­bil­ity is very impor­tant for me :) (I know, mac-fuse and every­thing, but it is a huge jump in complexity).

  • Giuseppe

    I for­got: boos­t­rap­ping is also kind of easy using the API. These are the two file I use on a (toy) project I am developing:

    http://pastebin.com/f71e884ba to boos­t­rap a box and install python in it
    http://pastebin.com/fe1f06fa to install non-python depen­den­cies (python depen­den­cies use a pip require­ments file).

    This is some­thing I’d really like to auto­mate, but I need to fig­ure out some prob­lems before start­ing to code…

  • http://softver.org.mk/damjan Dam­jan

    If you only need to use Python 2.6 (or bet­ter) you can use PEP370 instead of virtualenv:

    all I do is:
    PYTHONUSERBASE=$HOME/my-python
    pip.py install –install-option=”–user” PythonPackage

    There’s no copy of the inter­preter, no setup­tools required, the she­bang is still “#!/usr/bin/python” every­thing is installed in $HOME/my-python/bin and $HOME/my-python/lib/python2.6/site-packages

    If I switch to python3(.1) I don’t have to do any­thing spe­cial since my cus­tom pack­ages for 3.1 will be installed in $HOME/my-python/lib/python3.1/site-packages (except that the /bin might be conflicting).

  • http://jessenoller.com jnoller

    I missed that in 370, I thought it only han­dled the site-packages bit — I’m sur­prised it han­dles the bin scripts prop­erly — they should not be installed in /bin. I’ll dou­ble check my stuff to check that, and the proper she­bang line (cause so far, setuptools/distutils isn’t play­ing nice like that.

  • http://twitter.com/scw Shaun Wal­bridge

    Per­haps I’m miss­ing some­thing, but have you given build­out (http://www.buildout.org/) a shot? It’s used to good effect in the Plone com­mu­nity, where it helps man­age com­plex depen­den­cies includ­ing com­piled soft­ware, and works quite nicely once you’ve learned the basics of cre­at­ing buildout.cfg files.

  • kteague

    Twid­dling with the she­bang, that’s a Dis­tu­tils thing:

    http://docs.python.org/distutils/setupscript.ht…

    The only clever fea­ture is that if the first line of the script starts with #! and con­tains the word “python”, the Dis­tu­tils will adjust the first line to refer to the cur­rent inter­preter loca­tion.” Heh, one man’s “clever fea­ture” is another man’s headache. The –exe­cutable option allows you to over­ride this behav­iour though.

    But the other option is to declare a set of repeat­able steps so that you can com­pile from source on fresh machines. Build­out is the uber-tool for this job! Here’s a decent starter for com­pil­ing a Python inter­preter using Build­out (http://bluedynamics.com/articles/jens/build-pyt…). And gen­er­ally any­thing is hairy or a PITA to com­pile on OS X, you can Google for “build­out hard-to-build-thing” and some­one has usu­ally put together a recipe for build­ing it.

  • kteague

    Twid­dling with the she­bang, that’s a Dis­tu­tils thing:

    http://docs.python.org/distutils/setupscript.ht…

    The only clever fea­ture is that if the first line of the script starts with #! and con­tains the word “python”, the Dis­tu­tils will adjust the first line to refer to the cur­rent inter­preter loca­tion.” Heh, one man’s “clever fea­ture” is another man’s headache. The –exe­cutable option allows you to over­ride this behav­iour though.

    But the other option is to declare a set of repeat­able steps so that you can com­pile from source on fresh machines. Build­out is the uber-tool for this job! Here’s a decent starter for com­pil­ing a Python inter­preter using Build­out (http://bluedynamics.com/articles/jens/build-pyt…). And gen­er­ally any­thing is hairy or a PITA to com­pile on OS X, you can Google for “build­out hard-to-build-thing” and some­one has usu­ally put together a recipe for build­ing it.

What's this?

You are currently reading Trapped in python package; send food. at jessenoller.com.

meta