PEP 370 - Per user site-packages, and environment stew

by jesse in ,


cyber.jpgSo, following up from my hard-hitting rant on the subject of dealing with packaging a portable python version (without hardcoded shebang lines) for OS/X, and later cutting over to a kickstart based virtualenv setup, I thought I'd dig into PEP 370 "a bit" as someone pointed out to me this might just cure some of the heart burn. I put "a bit" in quotes for a reason - PEP 370 itself was probably one of the simplest discussions around a feature on python-dev. It came in on the 2.6-and-forward boat last year. It's also only about 2-3 pages long, depending on your font size.

The idea is this - when you run python2.6/3.0 (from now on, I'm sticking with 2.6) you will get a ~/.local directory (for those "not in the know" - ~ is your home directory, e.g. /Users/jesse on OS/X).

This directory is laid out like this:

.local/
    bin/
    lib/
        pythonX.X (wherein X.X is the version number)
            site-packages

Disutils was modified to support the --user argument. This means you can run "python setup.py --user" and your .local directory will get populated with the delicious nougat payload of the app.

pip supports this just fine, for example:

zim:~ jesse$ /Library/Frameworks/Python.framework/Versions/2.6/bin/pip install \
--install-option="--user" yolk

Downloading/unpacking yolk
  Downloading yolk-0.4.1.tar.gz (80Kb): 80Kb downloaded
  Running setup.py egg_info for package yolk
Installing collected packages: setuptools, yolk
  Running setup.py install for yolk
    Installing yolk script to /Users/jesse/.local/bin
Successfully installed yolk

Hooray! Look! Files!

zim:~ jesse$ ls -lr .local/
total 0
drwxr-xr-x@ 6 jesse  jesse  204 Mar 31 18:35 lib
drwxrwxr-x  3 jesse  jesse  102 Jul 18 22:09 bin
zim:~ jesse$ ls -lr .local/lib/python2.6/site-packages/
total 0
drwxrwxr-x   9 jesse  jesse  306 Jul 18 22:09 yolk-0.4.1-py2.6.egg-info
drwxrwxr-x  17 jesse  jesse  578 Jul 18 22:09 yolk
zim:~ jesse$ ls -lr .local/bin/
total 8
-rwxr-xr-x  1 jesse  jesse  323 Jul 18 22:09 yolk
zim:~ jesse$ 

Yes, this means yolk is now installed into my local directory - not the global directory. I can also add .local/bin to my PATH and gain access to the yolk binary. This is a huge step forward. Oh, wait. There's only one yolk binary:

zim:~ jesse$ cat .local/bin/yolk 
#!/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/Contents/MacOS/Python
# EASY-INSTALL-ENTRY-SCRIPT: 'yolk==0.4.1','console_scripts','yolk'
__requires__ = 'yolk==0.4.1'
import sys
from pkg_resources import load_entry_point

sys.exit(
   load_entry_point('yolk==0.4.1', 'console_scripts', 'yolk')()
)

Hmm. As you can see, the hardcoded shebang line is there - it's a disutils thing. But this means if I have 3.x installed (and 2.7, and 3.1) and I install yolk into any of those, the yolk binary will get overwritten and have a hardcoded shebang line for the last-installed version.

By default, some packages will also lay down scripts which include the version number, for example:

-rwxr-xr-x   1 jesse  jesse   357B Jul 19 21:44 easy_install
-rwxr-xr-x   1 jesse  jesse   365B Jul 19 21:44 easy_install-2.6
-rwxr-xr-x   1 jesse  jesse   386B Jul 19 21:40 easy_install-3.1

In this example, the hardcoded shebang line is treated as lifo - last in, first out. In this example, I installed the python 3.1 version, and then the 2.6 version. If you look in easy_install, you'll see that it points to the 2.6 version. Sure - I have version-specific names as well, but good luck remember they're there (I always forget), and they're not symlinks.

I think a better way of managing this (and I'm shooting this to python-ideas) is to move the bin directory under a matching python version directory. So that way it mirrors .local/lib/pythonx.x. You would get a .local/bin/pythonx.x directory as well, and wouldn't need to worry about conflicts. Or we just ditch the versions without the version number in them altogether. (link to python-ideas thread)

In any case, this is great for the simple case: you don't need to install into the global site-packages directory any longer. You just pass in --user to all of the install scripts, for example:

  • python setup.py install --user FooPackage
  • pip install --install-option="--user" FooPackage

Notice easy_install isn't here: that's because it doesn't allow the pass-through of the --user command to disutils, favoring setuptools method of doing things. That's lamesauce, but setuptools/easy_install also pre-dates PEP 370, so we'll just skip past that.

Alright - so, a per-user site-packages directory, minus some binary issues - well, when poking around I suspected there might be some other un-versioned high level directories, so I went digging for a package on pypi which had a million dependencies - or more than one.

zim:~ jesse$ /Library/Frameworks/Python.framework/Versions/2.6/bin/pip install \
--install-option="--user" paver-templates

  Running setup.py install for Sphinx
  Running setup.py install for paver-templates
  Running setup.py install for Paver
  Running setup.py install for PasteDeploy
  Running setup.py install for docutils
  Running setup.py install for Pygments
  Running setup.py install for Jinja2
  Running setup.py install for Cheetah
  Running setup.py install for Paste
Successfully installed paver-templates

I abbreviated the output a bit - so 8 dependencies in total, which resulted in a large increase of "stuff" in the .local/lib/python2.6/site-packages directory - but also in a new .local/docs directory:

zim:~ jesse$ ls -lah .local/docs/
total 1096
drwxrwxr-x  17 jesse  jesse   578B Jul 19 14:12 .
drwxr-xr-x@  5 jesse  jesse   170B Jul 19 14:12 ..
-rw-rw-r--   1 jesse  jesse   125K Jul 19 14:12 api.html
-rw-rw-r--   1 jesse  jesse   7.2K Jul 19 14:12 changelog.html
-rw-rw-r--   1 jesse  jesse    99K Jul 19 14:12 extensions.html
-rw-rw-r--   1 jesse  jesse    13K Jul 19 14:12 faq.html
...snip...

points.jpg More top-level un-versioned stuff, which will again conflict if I go and install this in say, python3.1. The same issue could arise with any data files stored in the top-level (although most of the packages plop them into site-packages with the code, which is the correct way to do it).

So where does this leave us? Well, first off, I would say this - this is a huge improvement over the old site-packages method. Huge. Massive. Why? Even with the versioning issues I've sort of harped on above, this is simply a better way to install and manage packages a user needs.

That being said - installing into the user's local site-packages should be the preferred deployment method in distutils, rather than needing to pass in --user, we should pass in the inverse, --global. I know this is flamebait - but really, in a world where more and more operating system critical things are being written in Python and using the installed framework (see Fedora as a prime example), it's really not smart to go mucking around in the global bin directories, or the global site packages.

I'd also make the argument that even the .local structure outlines in pep 370 doesn't remove/replace the need for something like virtualenv. Here's why.

Running my experiments for this, I managed to add 38 directories and files into my .local/lib/python2.6 directory. This includes packages, .pth files, egg-info directories, and actual package code directories. What if I just wanted to use it for a single application? How do I deal with some apps or packages which want versions? Now, instead of running "sudo rm -rf /Library/.../site-packages/xxx" I can easily run "rm -rf ~/.local/lib/python2.6/xxx" - but that's still equivalent to needing to treat .local/lib/xxx like a bonsai tree.

I'd rather treat it like my girlfriends used me as a teenager; spin it up and then drop it off in the bad part of town never to be heard from again. Meaning, build it, install it, delete it.

Not to mention, something like virtualenv (and it's integration with pip - or is it pip's integration with virtualenv?) offers additional niceties above and beyond the use it and delete it use-case. You can build an isolated environment, and then run pip over it to generate a bundle, or requirements file, which you can then share with other people (for example).

It also allows me to keep things compartmentalized in a near OCD-level. Now, I could do this with the features in PEP 370, sort of. It supports the PYTHONUSERBASE environment variable, which means you could make a tree like this:

.local/
    app1/
        bin/
        lib/
            python2.6/...

And then write a quick bash function to say "switch PYTHONUSERBASE to .local/app1" - if that's what floats your boat (and swaps scripts-without-versions to symlinks so you can count on it pointing to the right version). But why not use something which does this for you, like virtualenv? It also isolates the interpreter itself, not just the packages you want.

gran-torino-clint-eastwood.jpg And it works with the features of PEP 370. Meaning, if you create a virtualenv, it will still load the .local directory when you load that virtualenv. However, while some might find this desirable, I don't, and not in the "clint-eastwood-in-gran-torino-get-off-my-lawn" way. Also add the fact that if .local is exposed in the virtualenv, you'll still lack access to the scripts outside the virtualenv (more on this in a moment). I end up disabling .local loading in the interpreter by exporting PYTHONNOUSERSITE (see the pep) within virtualenvwrapper whenever I call "workon" for a given environment.

Right now, if you run "virtualenv --no-site-packages flubber" you (purposefully) sandbox yourself away from the global site-packages directory. You however, do not get the option to omit the .local directory (yes, I'm going to file a bug - I'm up to two or three to file so far). If I want a sandbox, I want a sandbox. It's like owning cats - you want them to go in the litter box, not the litter box + a five foot radius.

Also, using virtualenv compartmentalizes installed binaries. Meaning if I make "flubber" and install say, pylint into it, the pylint binaries stick to that virtualenv. And therein lies a different catch.

In my other post, I griped about hard coded paths in the shebang line (#!). This problem is still here, all I've done is outline some of the features of the pep and virtualenv. Let's talk scenarios. Let's say I install pylint into my .local directory. It's shebang will point to the version of the python 2.6 binary I've got installed. If I make a virtualenv and try to run pylint on code which depends on a library I've sandboxed, it won't work. Why? Because you need to reinstall it into that virtualenv, so it can point to that interpreter.

If the shebang line instead used "/usr/bin/env python" - you could side step this, as any packages-with-binaries installed into the user directories, or the global dirs could just load the interpreter of the virtualenv instead... except... wait for it... it wouldn't have it's needed libraries in that virtualenv, which is why it has the hardcoded shebang line in the first place (whee!).

Back to square one.

Using virtualenv though, you can make a bootstrap script to install common utilities (such as pylint) into the environment during creation. Look at the after_install hook. So this works around the entire script-outside the sandbox (but you still get things from the .local directory). You can also use the .local version of pip (should you have it installed) to install a library into a virtualenv sandbox.

Here's where we are. Installing packages into the global directories (/usr, site-packages, etc) is considered unsanitary and may lead to bad things. So don't do it - unless you have to, and the times you have to should be rare.

Installing things into your .local directory makes a lot of sense, and which is what you should do, especially for things like libraries you want to use. Scripts get dumped (unversioned) into .local/bin. Using a virtualenv on top of all this is still useful and a good way to manage things - you get (mostly) isolated environments, you can point it at any interpreter and generate an environment for just that version of python (which is what I do). You can also use it to make sandboxes within sanboxes. For example, I make a "master" python2.6 one, named "python2.6" - inside it's directory, I can make a directory named "sandboxes", install virtualenv within it, and make sub-sandboxes within that.

So, PEP 370 is a great change, and pretty darned useful. It still has some of the drawbacks of the global directories (but makes your life as a user/consumer much easier) but its made better (as in the global case) by adding virtualenv on top of it.

For me, I compile python into it's own directory (/Users/jesse/slash) and then make a "master" virtual machine for each version, and end up using that 95% of the time for experimentation/coding/etc. I made a custom bootstrap environment, and a pip requirements file to manhandle the additional things I want in every environment I make.

None of this - PEP 370, virtualenv, etc are without their drawbacks, or things I'd like to improve - they're an improvement on the status quo, and can definitely be made better. Personally, I can't live without virtualenv and virtualenvwrapper. I don't think I'd use virtualenv as much without virtualenvwrapper.

For bonus reading, check out this email from Tarek describing the consumer use-cases, I think it's a good, succinct outline.