So, I (and many others) have lamented packaging issues in Python. Some people are focused on integrating with vendor systems (such as apt (.deb) and yum (rpm)) - while others are concerned with disutils/setuptools/etc. Still others (like me, and maybe I'm alone) are trapped in a tween-state. We're partially using vendor systems, and partially using self-compiled versions of python.
The cardinal "rule" has been not to "touch" the vendor-specific installations of python (this includes you, Linux). For example, on OS/X - any time you run easy_install or pip you install into the global site-packages directory. The same applies when you do the same on linux, and when you run apt-get install/yum-install. Things go into that global, shared directory.
This sucks. Here's why:
- Versions. Some applications depend on very specific versions of libraries. This is because the maintainers of the libraries they depend on are bad, and break backwards compatibility.
- site-packages becomes a toilet. Before my near OCD levels of cleanliness, I checked my system's site-packages directory - I think all told I had about 250 different .eggs/packages/modules/etc all littered in there. And .pth files, and half-exploded things with metadata directories. And I think I found a squirrel in there.
- "globally" installing things like nose, pip and setuptools put the binary scripts in /usr, /usr/local and so on. This again causes those directories to become a toilet.
- In some cases, upgrading something outside of your vendor packages - say, something pre-installed into RedHat's python version can in fact, break and side-effect the system as a whole.
So, I guess you could say "system-level site-packages considered harmful". Once I realized the horrible error of my ways, I switched to virtualenv/virtualenvwrapper. This works great for me. But at least on OS/X - something was lacking.
That something was dependencies needed to compile something like readline into python. I could install the readline egg from pypi and just "work around it". Or I could install macports (which is broken in many ways) and install the readline development libraries in there.
Unfortunately, macports also side effects your system in undesirable ways. Suddenly you're linking to things you don't realize, you've got things compiled in you don't need/want, and so on.
So, what's a guy supposed to do?
Well, since I'm not afraid of compiling things, I built a mini-macports for myself. I made a directory (named "slash") in my home directory, and compiled things like readline into it. I then point the python compiles to that directory and move on with my life (I love you, --prefix). After compiling/installing PIL, Readline, etc into this directory as well as a pile of python versions, and slapping virtualenv on top of it I was feeling pretty good. I get only what I need, and virtualenv keeps things out of the global directories.
Well. Minus the fact that it's huge, non portable and it's sort of a pain in the ass.
Then, I got an itch - I wanted to build a "python megapack" - I lovingly named it python-kitchensink. My goal was to repeat what I did above, and then offer it as a download for people who want to avoid this pain themselves on OS/X.
Easy enough. Minus one nit.
You can't tar the damned thing up. I don't know if it's a side effect of disutils/setuptools, but scripts being installed into this root, were having the #! lines hard coded to the exact path of the interpreter. This means if you went through all this compilation, and then installed easy_install - and say you did this in "/Users/jesse/myslash" - easy_install would get "#!/Users/jesse/myslash/bin/python2.6" hard coded into it.
Instead of kitchensink, I should have named it "jesse cusses a lot".
So, back to square one. Or rather "think about this in the back of my mind, forget about it and then change to a new job".
Forgetting about trying to do this for OS/X, I end up needing to do something eerily similar on Fedora Core. Now, compilation of python with all the bells and whistles on Fedora is simple - "yum install xxx-devel" and then just run the compile.
The goal was to make a fully-featured python 2.6 install on FC10, and then bootstrap the user(s) into a virtualenv so that nothing got plopped into the global directories.
Well - minus the fact fedora core 10 ships with python 2.5. And tools like virtualenv/etc from the yum repos lag behind the versions I want/need. Damnit. Do I stick to RPMs? Do I bootstrap it enough to "just work" and then pip install the rest? What about python2.6? Where are my pants?
There's another catch: it has to work on *first boot* and there's no network on that first boot.
So, forgetting my experiences with compiling all this stuff myself on OS/X, that's what I do at first. I install all the devel packages, build an RPM which consumes a tarball I create, and add it to a local repo, and throw it in the kickstart file which spews out the images.
Oh but wait. The hardcoded #!'s come back and bite me in the ass. The build server compiles things in a temporary directory, and then installs easy_install and all of the other tools into the --prefix'ed python install. That temp directory is named something like "--TMPxx1341234DFLKJ1341234.xxx.hahaha". Soooo, I get "#!/--TMPxx1341234DFLKJ1341234.xxx.hahaha/bin/python". That's about as useful as a beehive in my toilet.
Easy fix though: just make sure the buildserver doesn't have anything in the eventual location of the installed version from the rpm (/opt/lazercats (ok, not really)) and just compile everything there.
Success, and win. Heck, I even get it to bootstrap virtualenvs for the users. Then I find out I've increased the image size by 40 or so megabytes. This immediately wipes the grin off my face and makes me realize I have again, failed. You see, I can't freely increase the image size like that.
I need python 2.6. So, step one is to swap to fc11. Ok, good. I also want to avoid using the lag-behind vendors packages except for the bare minimum footprint I need to bootstrap the environment. This means modifying the kickstart packages list like this (note: I also can not install a compiler - which is needed for a lot of packages):
# Python utilities # python-lxml is == 2mb python-lxml python-setuptools python-crypto python-paramiko python-pycurl # Needed for virtualenv < 1.0 mb python-devel python-setuptools-devel
Why on earth is python-devel needed for virtualenv? Why python-setuptools-devel? Whyyyy??! Ok, so I'm only going to be stuck with upstream versions of lxml, setuptools (which hasn't revved since the earth cooled) and a few others. Fine.
I then jump into kickstart file and pop in:
%post --nochroot cp python-dependencies.txt $INSTALL_ROOT/root/python-dependencies.txt %post %include post.txt %end
# Python environment setup # Temporarily make DNS work echo "nameserver 10.1.1.10" >/etc/resolv.conf # Python environment setup ( cd /root /usr/bin/easy_install virtualenv /usr/bin/easy_install virtualenvwrapper /usr/bin/virtualenv /opt/thatthing /opt/foobar/bin/easy_install pip /opt/foobar/bin/pip -E /opt/thatthing install -r /root/python-dependencies.txt rm -rf build/ python-dependencies echo "export WORKON_HOME=/opt" >>/home/jnoller/.bash_profile echo "source /usr/bin/virtualenvwrapper_bashrc" >>/home/jnoller/.bash_profile ) rm -f /etc/resolv.conf # End Python setup
The python-dependencies.txt is a pip requirements file and looks like this:
# use pip install -r
# http://code.google.com/p/boto/ boto # http://docs.fabfile.org/0.9/ fabric # http://ipython.scipy.org/moin/ ipython # http://tools.assembla.com/yolk yolk # http://code.google.com/p/httplib2/ httplib2 # http://ipaddr-py.googlecode.com http://ipaddr-py.googlecode.com/files/ipaddr-1.1.1.tar.gz
Note, I can't also plop svn, hg, git, etc in here - so packages not on the cheeseshop in or packaged right are a no-go.
The trick here is that the %post commands in the kickstart environment run in a chroot of the OS being created. This means, once the new image is loaded (say, in EC2) I can ssh in, and hit "workon thatthing". In reality, the WORKON dir should be elsewhere, but I'm going to let users override that. As it is, the "one true python" version is the one in /opt - no one (even me) gets to touch the system version of python.
I now have a python environment, available on first boot, isolated from the OS-provided one. I can spawn infinitely more virtualenvs and play all day long. The few global things I have are easy_install and some libraries which I hope I don't need to rev myself.
I still haven't licked the OS/X part. I'm probably just going to have to compile the barest possible environment in something like /opt/python-ks and go from there. Given I'd need to compile all of the dependencies into it (such as readline) I may just end up writing a big script to grab all the bits and then compile it into a location the user provides. The nice thing is that once I bootstrap python and virtualenv into the basic tree, I can use pip bundles/requirements files to pull in the rest.
All told, I sit here looking at the mess I've slogged through - and then I realize the entire python-packaging discussion on python-dev just exposes a whole 'nother can of worms. Versioning in a single site-packages directory, how app developers conflict with OS vendors, etc. It's a mess. OS Vendors lag behind developer released versions, and come to depend on what's installed there (have you ever broken yum on a Fedora box? I have.).
I hope Tarek gets a chance to clean a lot of this up - and while I'm against "everything and the kitchen sink" in the stdlib - having some method/API of building out "an official-like" virtualenv setup (maybe making virtualenv's life easier) would be nice.
Edit to add: I realize that hardcoding the shebang line is desirable in many cases, the obvious reason is that you need to be pointed at the interpreter which has your dependencies/libraries in it. Not having a clear way of altering that behavior (other than a "clever" sed script) is unfortunate.
See this followup as well