PEP 370 — Per user site-packages, and environment stew

July 19th, 2009 § 16 comments

cyber.jpgSo, fol­low­ing up from my hard-hitting rant on the sub­ject of deal­ing with pack­ag­ing a portable python ver­sion (with­out hard­coded she­bang lines) for OS/X, and later cut­ting over to a kick­start based vir­tualenv setup, I thought I’d dig into PEP 370 “a bit” as some­one pointed out to me this might just cure some of the heart burn.

I put “a bit” in quotes for a rea­son — PEP 370 itself was prob­a­bly one of the sim­plest dis­cus­sions around a fea­ture on python-dev. It came in on the 2.6-and-forward boat last year. It’s also only about 2–3 pages long, depend­ing on your font size.

The idea is this — when you run python2.6/3.0 (from now on, I’m stick­ing with 2.6) you will get a ~/.local direc­tory (for those “not in the know” — ~ is your home direc­tory, e.g. /Users/jesse on OS/X).

This direc­tory is laid out like this:

.local/
    bin/
    lib/
        pythonX.X (wherein X.X is the version number)
            site-packages

Disu­tils was mod­i­fied to sup­port the –user argu­ment. This means you can run “python setup.py –user” and your .local direc­tory will get pop­u­lated with the deli­cious nougat pay­load of the app.

pip sup­ports this just fine, for example:

zim:~ jesse$ /Library/Frameworks/Python.framework/Versions/2.6/bin/pip install \
--install-option="--user" yolk

Downloading/unpacking yolk
  Downloading yolk-0.4.1.tar.gz (80Kb): 80Kb downloaded
  Running setup.py egg_info for package yolk
Installing collected packages: setuptools, yolk
  Running setup.py install for yolk
    Installing yolk script to /Users/jesse/.local/bin
Successfully installed yolk

Hooray! Look! Files!

zim:~ jesse$ ls -lr .local/
total 0
drwxr-xr-x@ 6 jesse  jesse  204 Mar 31 18:35 lib
drwxrwxr-x  3 jesse  jesse  102 Jul 18 22:09 bin
zim:~ jesse$ ls -lr .local/lib/python2.6/site-packages/
total 0
drwxrwxr-x   9 jesse  jesse  306 Jul 18 22:09 yolk-0.4.1-py2.6.egg-info
drwxrwxr-x  17 jesse  jesse  578 Jul 18 22:09 yolk
zim:~ jesse$ ls -lr .local/bin/
total 8
-rwxr-xr-x  1 jesse  jesse  323 Jul 18 22:09 yolk
zim:~ jesse$

Yes, this means yolk is now installed into my local direc­tory — not the global direc­tory. I can also add .local/bin to my PATH and gain access to the yolk binary. This is a huge step for­ward. Oh, wait. There’s only one yolk binary:

zim:~ jesse$ cat .local/bin/yolk
#!/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/Contents/MacOS/Python
# EASY-INSTALL-ENTRY-SCRIPT: 'yolk==0.4.1','console_scripts','yolk'
__requires__ = 'yolk==0.4.1'
import sys
from pkg_resources import load_entry_point

sys.exit(
   load_entry_point('yolk==0.4.1', 'console_scripts', 'yolk')()
)

Hmm. As you can see, the hard­coded she­bang line is there — it’s a disu­tils thing. But this means if I have 3.x installed (and 2.7, and 3.1) and I install yolk into any of those, the yolk binary will get over­writ­ten and have a hard­coded she­bang line for the last-installed version.

By default, some pack­ages will also lay down scripts which include the ver­sion num­ber, for example:

-rwxr-xr-x   1 jesse  jesse   357B Jul 19 21:44 easy_install
-rwxr-xr-x   1 jesse  jesse   365B Jul 19 21:44 easy_install-2.6
-rwxr-xr-x   1 jesse  jesse   386B Jul 19 21:40 easy_install-3.1

In this exam­ple, the hard­coded she­bang line is treated as lifo — last in, first out. In this exam­ple, I installed the python 3.1 ver­sion, and then the 2.6 ver­sion. If you look in easy_install, you’ll see that it points to the 2.6 ver­sion. Sure — I have version-specific names as well, but good luck remem­ber they’re there (I always for­get), and they’re not symlinks.

I think a bet­ter way of man­ag­ing this (and I’m shoot­ing this to python-ideas) is to move the bin direc­tory under a match­ing python ver­sion direc­tory. So that way it mir­rors .local/lib/pythonx.x. You would get a .local/bin/pythonx.x direc­tory as well, and wouldn’t need to worry about con­flicts. Or we just ditch the ver­sions with­out the ver­sion num­ber in them alto­gether. (link to python-ideas thread)

In any case, this is great for the sim­ple case: you don’t need to install into the global site-packages direc­tory any longer. You just pass in –user to all of the install scripts, for example:

  • python setup.py install –user FooPackage
  • pip install –install-option=”–user” FooPackage

Notice easy_install isn’t here: that’s because it doesn’t allow the pass-through of the –user com­mand to disu­tils, favor­ing setup­tools method of doing things. That’s lame­sauce, but setuptools/easy_install also pre-dates PEP 370, so we’ll just skip past that.

Alright — so, a per-user site-packages direc­tory, minus some binary issues — well, when pok­ing around I sus­pected there might be some other un-versioned high level direc­to­ries, so I went dig­ging for a pack­age on pypi which had a mil­lion depen­den­cies — or more than one.

zim:~ jesse$ /Library/Frameworks/Python.framework/Versions/2.6/bin/pip install \
--install-option="--user" paver-templates

  Running setup.py install for Sphinx
  Running setup.py install for paver-templates
  Running setup.py install for Paver
  Running setup.py install for PasteDeploy
  Running setup.py install for docutils
  Running setup.py install for Pygments
  Running setup.py install for Jinja2
  Running setup.py install for Cheetah
  Running setup.py install for Paste
Successfully installed paver-templates

I abbre­vi­ated the out­put a bit — so 8 depen­den­cies in total, which resulted in a large increase of “stuff” in the .local/lib/python2.6/site-packages direc­tory — but also in a new .local/docs directory:

zim:~ jesse$ ls -lah .local/docs/
total 1096
drwxrwxr-x  17 jesse  jesse   578B Jul 19 14:12 .
drwxr-xr-x@  5 jesse  jesse   170B Jul 19 14:12 ..
-rw-rw-r--   1 jesse  jesse   125K Jul 19 14:12 api.html
-rw-rw-r--   1 jesse  jesse   7.2K Jul 19 14:12 changelog.html
-rw-rw-r--   1 jesse  jesse    99K Jul 19 14:12 extensions.html
-rw-rw-r--   1 jesse  jesse    13K Jul 19 14:12 faq.html
...snip...

points.jpg
More top-level un-versioned stuff, which will again con­flict if I go and install this in say, python3.1. The same issue could arise with any data files stored in the top-level (although most of the pack­ages plop them into site-packages with the code, which is the cor­rect way to do it).

So where does this leave us? Well, first off, I would say this — this is a huge improve­ment over the old site-packages method. Huge. Mas­sive. Why? Even with the ver­sion­ing issues I’ve sort of harped on above, this is sim­ply a bet­ter way to install and man­age pack­ages a user needs.

That being said — installing into the user’s local site-packages should be the pre­ferred deploy­ment method in dis­tu­tils, rather than need­ing to pass in –user, we should pass in the inverse, –global. I know this is flame­bait — but really, in a world where more and more oper­at­ing sys­tem crit­i­cal things are being writ­ten in Python and using the installed frame­work (see Fedora as a prime exam­ple), it’s really not smart to go muck­ing around in the global bin direc­to­ries, or the global site packages.

I’d also make the argu­ment that even the .local struc­ture out­lines in pep 370 doesn’t remove/replace the need for some­thing like vir­tualenv. Here’s why.

Run­ning my exper­i­ments for this, I man­aged to add 38 direc­to­ries and files into my .local/lib/python2.6 direc­tory. This includes pack­ages, .pth files, egg-info direc­to­ries, and actual pack­age code direc­to­ries. What if I just wanted to use it for a sin­gle appli­ca­tion? How do I deal with some apps or pack­ages which want ver­sions? Now, instead of run­ning “sudo rm –rf /Library/…/site-packages/xxx” I can eas­ily run “rm –rf ~/.local/lib/python2.6/xxx” — but that’s still equiv­a­lent to need­ing to treat .local/lib/xxx like a bon­sai tree.

I’d rather treat it like my girl­friends used me as a teenager; spin it up and then drop it off in the bad part of town never to be heard from again. Mean­ing, build it, install it, delete it.

Not to men­tion, some­thing like vir­tualenv (and it’s inte­gra­tion with pip — or is it pip’s inte­gra­tion with vir­tualenv?) offers addi­tional niceties above and beyond the use it and delete it use-case. You can build an iso­lated envi­ron­ment, and then run pip over it to gen­er­ate a bun­dle, or require­ments file, which you can then share with other peo­ple (for example).

It also allows me to keep things com­part­men­tal­ized in a near OCD-level. Now, I could do this with the fea­tures in PEP 370, sort of. It sup­ports the PYTHONUSERBASE envi­ron­ment vari­able, which means you could make a tree like this:

.local/
    app1/
        bin/
        lib/
            python2.6/...

And then write a quick bash func­tion to say “switch PYTHONUSERBASE to .local/app1” — if that’s what floats your boat (and swaps scripts-without-versions to sym­links so you can count on it point­ing to the right ver­sion). But why not use some­thing which does this for you, like vir­tualenv? It also iso­lates the inter­preter itself, not just the pack­ages you want.

gran-torino-clint-eastwood.jpg
And it works with the fea­tures of PEP 370. Mean­ing, if you cre­ate a vir­tualenv, it will still load the .local direc­tory when you load that vir­tualenv. How­ever, while some might find this desir­able, I don’t, and not in the “clint-eastwood-in-gran-torino-get-off-my-lawn” way. Also add the fact that if .local is exposed in the vir­tualenv, you’ll still lack access to the scripts out­side the vir­tualenv (more on this in a moment). I end up dis­abling .local load­ing in the inter­preter by export­ing PYTHONNOUSERSITE (see the pep) within vir­tualen­vwrap­per when­ever I call “workon” for a given environment.

Right now, if you run “vir­tualenv –no-site-packages flub­ber” you (pur­pose­fully) sand­box your­self away from the global site-packages direc­tory. You how­ever, do not get the option to omit the .local direc­tory (yes, I’m going to file a bug — I’m up to two or three to file so far). If I want a sand­box, I want a sand­box. It’s like own­ing cats — you want them to go in the lit­ter box, not the lit­ter box + a five foot radius.

Also, using vir­tualenv com­part­men­tal­izes installed bina­ries. Mean­ing if I make “flub­ber” and install say, pylint into it, the pylint bina­ries stick to that vir­tualenv. And therein lies a dif­fer­ent catch.

In my other post, I griped about hard coded paths in the she­bang line (#!). This prob­lem is still here, all I’ve done is out­line some of the fea­tures of the pep and vir­tualenv. Let’s talk sce­nar­ios. Let’s say I install pylint into my .local direc­tory. It’s she­bang will point to the ver­sion of the python 2.6 binary I’ve got installed. If I make a vir­tualenv and try to run pylint on code which depends on a library I’ve sand­boxed, it won’t work. Why? Because you need to rein­stall it into that vir­tualenv, so it can point to that interpreter.

If the she­bang line instead used “/usr/bin/env python” — you could side step this, as any packages-with-binaries installed into the user direc­to­ries, or the global dirs could just load the inter­preter of the vir­tualenv instead… except… wait for it… it wouldn’t have it’s needed libraries in that vir­tualenv, which is why it has the hard­coded she­bang line in the first place (whee!).

Back to square one.

Using vir­tualenv though, you can make a boot­strap script to install com­mon util­i­ties (such as pylint) into the envi­ron­ment dur­ing cre­ation. Look at the after_install hook. So this works around the entire script-outside the sand­box (but you still get things from the .local direc­tory). You can also use the .local ver­sion of pip (should you have it installed) to install a library into a vir­tualenv sand­box.

Here’s where we are. Installing pack­ages into the global direc­to­ries (/usr, site-packages, etc) is con­sid­ered unsan­i­tary and may lead to bad things. So don’t do it — unless you have to, and the times you have to should be rare.

Installing things into your .local direc­tory makes a lot of sense, and which is what you should do, espe­cially for things like libraries you want to use. Scripts get dumped (unver­sioned) into .local/bin. Using a vir­tualenv on top of all this is still use­ful and a good way to man­age things — you get (mostly) iso­lated envi­ron­ments, you can point it at any inter­preter and gen­er­ate an envi­ron­ment for just that ver­sion of python (which is what I do). You can also use it to make sand­boxes within san­boxes. For exam­ple, I make a “mas­ter” python2.6 one, named “python2.6″ — inside it’s direc­tory, I can make a direc­tory named “sand­boxes”, install vir­tualenv within it, and make sub-sandboxes within that.

So, PEP 370 is a great change, and pretty darned use­ful. It still has some of the draw­backs of the global direc­to­ries (but makes your life as a user/consumer much eas­ier) but its made bet­ter (as in the global case) by adding vir­tualenv on top of it.

For me, I com­pile python into it’s own direc­tory (/Users/jesse/slash) and then make a “mas­ter” vir­tual machine for each ver­sion, and end up using that 95% of the time for experimentation/coding/etc. I made a cus­tom boot­strap envi­ron­ment, and a pip require­ments file to man­han­dle the addi­tional things I want in every envi­ron­ment I make.

None of this — PEP 370, vir­tualenv, etc are with­out their draw­backs, or things I’d like to improve — they’re an improve­ment on the sta­tus quo, and can def­i­nitely be made bet­ter. Per­son­ally, I can’t live with­out vir­tualenv and vir­tualen­vwrap­per. I don’t think I’d use vir­tualenv as much with­out virtualenvwrapper.

For bonus read­ing, check out this email from Tarek describ­ing the con­sumer use-cases, I think it’s a good, suc­cinct outline.

  • http://solberg.is jokull

    I’m happy to see things edge in the right direc­tion. Vir­tualenv + workon with it’s sand­boxy, tab-completing benifits is some­thing I’d have a hard time liv­ing without.

  • http://www.zellyn.com/ Zel­lyn Hunter

    Just curi­ous: do you think you’re at the point where you could sit down and write out a descrip­tion of how pack­ag­ing in python would work if it was “per­fect” — or do you think the improve­ments are going to con­tinue incrementally?

  • http://jessenoller.com jnoller

    Well, kinda-sorta. Watch­ing the dis­tu­tils and python-dev/ideas threads have made me real­ized there’s a lot of nuances to deal­ing with prob­lems like versioning/etc. How­ever, I think there are three sim­ple pro­files we could out­line — the Devel­oper (i.e. me), the App pack­ager (writ­ing things for users, with depen­den­cies) and the OS Pack­ager (build­ing out the OS framework).

    Right now, I think mak­ing PEP 370 bet­ter in small steps goes a long way to scratch­ing the itch of the Devel­oper, and the OS Pack­ager — for the lat­ter, it means less wor­ry­ing about globally-installed things and how to man­age the per­mis­sions. For the for­mer, it allows me as a devel­oper to tightly con­trol what things go where.

    For the App Devel­oper, well — I sus­pect we’re rapidly going towards the OS/X type .app world, where python appli­ca­tions bun­dle most of an inter­preter, and the depen­den­cies they need into a sin­gu­lar bun­dle. Rely­ing on sys­tem installs, or ques­tion­able user envi­ron­ments is a bad place to be in. Sure — offer a “no bun­dled down­load” which power-users would use, but make a pack­age which con­tains every­thing you as an App Dev. need to make the other 90% of users be successful.

    From a dif­fer­ent view — I don’t know how much of this belongs in Python-core. At the lan­guage sum­mit I told tarek that I don’t think things like vir­tualenv or easy_install belong in python core — I still don’t. How­ever, I *do* think that dis­tu­tils or other in-core APIs can make it eas­ier (and “sup­ported”) to write tools which do these things.

    So, for exam­ple — no RPM sup­port in core, but APIs to make build­ing an RPM eas­ier. No vir­tualenv in core; but APIs to make vir­tualenv (and things like it) eas­ier, and “offi­cially supported”

  • rhodium

    Hi Jesse,

    Good post — very timely. I am also respon­si­ble for pack­ag­ing up script­ing lan­guages for my com­pany (EDA space) and while I agree with you on most cases I don’t on oth­ers. I will point out some of the chal­lenges I see with your post. The point of view I come from is ensur­ing that all users have a con­sis­tent (site) envi­ron­ment. I don’t focus too much on the “user-space” because I want them to do that for themselves.

    In gen­eral I think that if you can use the sys­tem defined (/usr/bin/python) and it’s default installed pack­ages you should. As you and I both know the temp­ta­tion to use exter­nal python mod­ules is too great and there­fore the only real way of using them it to muck up the default installed python, which in gen­eral I agree with you is a bad thing **. Also I am com­pletely sen­si­tive to the needs of devel­op­ers who like the lat­est ver­sion of python and so I am com­pletely cool with using –pre­fix and pack­ag­ing up a base python (rpm) in a spe­cific loca­tion for them to use.

    While I agree that “mucking-up” the default install tree is bad, I am ok with using a sin­gle *.pth file inside of the default site-packages loca­tion to allow the abil­ity to hook into exter­nal “site” direc­to­ries as needed. The ben­e­fits of doing this are as fol­lows:
    — I use a sin­gle envi­ron­ment vari­able which is defined in the /etc/profile (/etc/csh.cshrc) which spec­i­fies the “site” pack­age direc­tory to look in.
    — Allows me to ver­sion con­trol this site tree (we use per­force but any repos­i­tory could work).
    — Allows me to shift on the fly from a dev branch to a head release branch of installed mod­ules, or those in test.
    — I haven’t touched PYTHONPATH and reserved that for the user.
    — Mod­ule ver­sions and depen­dan­cies are han­dled via the ver­sion con­trol sys­tem which houses mod­ules. We use per­force so I can get back to a point in time using back-in-time brows­ing if I need to.
    — vir­tualenv is great for devel­op­ers but when real users have to use it — they are too eas­ily con­fused.. Sorry ;)

    So how does this work. Assump­tion: A generic instal­la­tion of python 2.x in a cus­tom (/usr/local/bin) space.
    — Add a site.pth file. The con­tents look like this..
    import os, site; smsc = os.environ.get(“SMSCTECHROOT”, “/smsc/tech”);
    smsc = smsc if os.path.isdir(os.path.join(smsc, “tools/python/Linux/x86_64/lib/python2.5/site-packages”))
    else “/smsc/tech”; site.addsitedir(os.path.join(smsc, “tools/python/Linux/x86_64/lib/python2.5/site-packages”))

    Because of the lim­i­ta­tions on *.pth this looks much more con­vo­luted than it needs to.. but I digress.
    — So when a user fires up python the tree is resolved and he is look­ing into the repos­i­tory (or depot) for python mod­ules. If a devel­oper is test­ing some­thing prior to com­mit­ting (or sub­mit) he sim­ply refers to his local area as SMSCTECHROOT. Remem­ber SMSCTECHROOT is defined in /etc/profile so it’s a guarantee.

    The prob­lems with this approach:
    - I agree that touch­ing a global direc­tory (“/usr”) is a bad idea but I just don’t see any­way around it.
    - I don’t like touch /etc/profile or /etc/csh.cshrc

    Over­all though I think your “user” spe­cific empha­sis is good. I also think that PEP370 is very well han­dled. I don’t nec­es­sar­ily agree with you that vir­tualenv is the best thing since sliced bread for the rea­son I pointed out. I would say if I was a appli­ca­tion devel­oper who wanted to pack­age his own appli­ca­tion I would use the mac .app approach and roll my own mini-python inside the .app tree.

    Thanks you made me think on a Mon­day — Keep up the good work!!

  • http://kteague.myopenid.com/ Kevin Teague

    The hard-coded she­bang this time is not because of dis­tu­tils, but because the pip-installed yolk dis­tri­b­u­tion is gen­er­at­ing a setuptools-style script for the yolk con­sole script entry point. If you were to install yolk with Build­out w/ the zc.recipe.egg recipe (alter­na­tive recipes might gen­er­ate dif­fer­ent scritps), you would instead get:

    #!/Users/kteague/buildouts/shared/python-2.6.1/bin/python

    import sys
    sys.path[0:0] = [
    ‘/Users/kteague/buildouts/shared/eggs/yolk-0.4.1-py2.6.egg’,
    ‘/Users/kteague/buildouts/shared/eggs/setuptools-0.6c9-py2.6.egg’,
    ]

    import yolk.cli

    if __name__ == ‘__main__’:
    yolk.cli.main()

    Still a hard-coded she­bang, but you can con­trol this zc.recipe.egg with the ‘python’ option. Below is the min­i­mal buildout.cfg for installing yolk:

    [build­out]
    parts = yolk

    [yolk]
    recipe = zc.recipe.egg
    eggs = yolk

    And then installing it with a cus­tom shebang:

    [build­out]
    parts = yolk

    [pytho­nenv]
    exe­cutable = /usr/bin/env python

    [yolk]
    recipe = zc.recipe.egg
    python = pytho­nenv
    eggs = yolk

    The Buildout-style of declar­ing pack­age loca­tions at install-time (instead of run-time like pip or easy_install) has the advan­tage that ‘setup­tools’ is not required to run yolk. OK, yolk is a bad exam­ple, since it actu­ally depends upon setup­tools at run-time, and so requires setup­tools … but there are lots of other pack­ages out there which declare a depen­dency on setup­tools, but don’t actu­ally require setup­tools — the wrap­per script that is gen­er­ated by easy_install or pip *does* require setup­tools though.

  • Aaron

    Isn’t “.local” a rather generic name for some­thing that’s going to be mys­te­ri­ously appear­ing in users’ home directories?

  • http://jessenoller.com jnoller

    Thanks — my big prob­lem with build­out has been one of sim­plic­ity, and under­stand­ing which recipe does which thing and which one makes sense. Trolling pypi for the “build­out” term results in *cough* quite a few of them.

    That being said, until Tarek can kick­ass and fix up dis­tu­tils to be “on par”-ish with setup­tools, and push out his fork of it, we play in a setuptools-based world, which is unfor­tu­nate in a lot of respects.

  • http://jessenoller.com jnoller

    You can change the name of the direc­tory with the PYTHONUSERBASE envi­ron­ment vari­able; but I def­i­nitely agree with you

  • http://kteague.myopenid.com/ Kevin Teague

    There are 143 at the moment, to be precise:

    http://pypi.python.org/pypi?:action=browse&show…

    But as a gen­eral pur­pose project setup and instal­la­tion tool, that’s kind of the point — dif­fer­ent recipes to install dif­fer­ent stuff. Although there are lots of recipes that are overly-specific and often a sin­gle more generic recipe is later devel­oped which can be used, and other recipes that are near-duplicates and a “canon­i­cal” one needs to be cho­sen: tem­plat­ing of con­fig files is a good exam­ple here.

    But usu­ally when peo­ple talk about Build­out in the con­text of man­ag­ing Python libraries, they actu­ally mean the zc.recipe.egg recipe, which is the most com­monly used recipe used to install python pack­ages and scripts.

    As for reach­ing under­stand­ing with Build­out, well, I think there is still lots of room for improve­ment in mak­ing a more user friendly “get­ting start­ing with Build­out” doc­u­ment. Learn­ing Build­out still does tend to require more head­bang­ing than should be needed.

  • http://kteague.myopenid.com/ Kevin Teague

    we play in a setuptools-based world, which is unfor­tu­nate in a lot of respects”

    This is only unfor­tu­nate in so much as setup­tools hasn’t seen a lot of main­te­nance of late, and that setup­tools does way too much stuff. Typ­i­cally its behavoiur such as how scripts are gen­er­ated by easy_install which irks peo­ple, or how pack­ages are down­loaded which bother oth­ers, or how pack­age meta­data is dif­fer­ent between setup­tools and dis­tu­tils. But there is noth­ing in the python pack­ag­ing ecosys­tem which says you need to con­sume or use the ugly bits of setup­tools — none of the ugly bits have been immor­tal­ized in a PEP! And the ugly bits of Dis­tu­tils that have been PEP-immortalized will hope­fully be kicked to the curb (PEP 314’s Requires and Pro­vides fields) with­out too much fuss.

    Hope­fully the ‘Dis­trib­ute’ fork can fur­ther clar­ify what peo­ple want to keep and what to seper­ate out from setup­tools. Heck, if some­one came up today with yet-another setup­tools fork, aside from an out­cry of “code dupli­ca­tion”, I think this could be a good thing! It would fur­ther clar­ify for folks what parts of pack­ag­ing are ‘stan­dards’ and ‘inter­changle for­mats’ and what parts are just instances of behav­iour of a given tool and can eas­ily be changed (by switch­ing tools) and only affect user’s of that given tool.

    I also think any truly sat­is­fy­ing solu­tion to library man­age­ment for a devel­oper with rea­son­ably com­plex require­ments won’t involve any site-packages. Obvi­ously site-packages isn’t going away any time soon, it caters really well for the “scripters” use-case where they just want to con­sume 3rd party libs in a more one-off script­ing nature. But PEP 370 and Vir­tualEnv work within the assump­tion and the con­straints of “we already have a Python instal­la­tion with a shared ‘bin’ loca­tion and shared ‘library’ loca­tion” and then place installed files inside those loca­tions. Vir­tualEnv is a total hack — but a beau­ti­ful one! It’s good how vir­tualenv is back­wards com­pat­i­ble with all of the exist­ing stuff — but by it’s nature it doesn’t try and re-think or re-work how the prob­lem is solved. With any shared loca­tion, be it a root-only site-packages, a .local site-packages, or a virtualenv-cloned site-packages, any shared loca­tion is an all-or-nothing loca­tion. With any all-or-nothing shared loca­tion you will always have the poten­tial for con­flicts: installing or updat­ing depen­den­cies for one app may break another already installed app. When attempt­ing to sort out version/dependency issues you can’t choose just some libs from a shared loca­tion, but eas­ily ignore other libs of the wrong ver­sion in that location.

    If you start with­out any notion of using a site-packages, then instead you can do some­thing such as the Build­out approach:

    ~/projects
    /app1
    /buildout.cfg
    /bin
    /yolk
    /app2
    /buildout.cfg
    /bin
    /yolk
    ~/pythonlibs
    /yolk-0.4.1-py2.6.egg
    /yolk-0.4.1-py2.4.egg

    This approach has the ben­e­fits of:

    * Don’t need to have dupli­cate instal­la­tions of the same ver­sion of the same library. Each ver­sion of each library only has to be installed once.

    * Scripts are installed in a project-specific loca­tion, not in a shared space.

    This does mean that if yolk-0.4.1 is used by two dif­fer­ent projects, then Build­out will gen­er­ate two ./[someproj]/bin/yolk script entries. But that’s a case where dupli­ca­tion is a good thing! While you may hap­pen to be work­ing on both projects today, and both projects are using the same libraries and python ver­sion, tomor­row could be a dif­fer­ent story. You might want to update one project up to a cutting-edge dev ver­sion of a library but don’t want to do that right now for the other project. Or maybe one project is being migrated from Python 2.5 to 2.6, but you want to hold the other project back to 2.5. You might want to have a yolk-0.4.1 and yolk-0.5dev installed side-by-side for one project, where ./bin/yolk-stable and ./bin/yolk-dev are expressed. heck, you might even want two scripts that call into yolk-0.4.1, but one ver­sion of the script needs to con­tain an extra line or two of hard-coded python to han­dle a spe­cific use case (./bin/yolk and ./bin/yolk-extra-easy). Scripts are always installed to sup­port a spe­cific appli­ca­tion or project, so always putting the scripts into a project-specific loca­tion makes things a lot sim­pler and I think is the way to go.

    Using a boot­strap in con­junc­tion with Vir­tualEnv to bring a suite of dev tools into any new project is a good idea (e.g. pylint, nose, zest.releaser). But why state, “These are the dev tools I use and want to carry around with me”, when instead you can state, “This project uses these dev tools for sup­port” and “That project uses those dev tools for sup­port”. Each project states it’s own requirement’s and pref­er­ences for what tools (and pos­si­bly ver­sions) it needs. In this way two developer’s who nor­mally pre­fer dif­fer­ent dev tools can eas­ily col­lab­o­rate on a project together. They don’t need to argue over what the “stan­dard dev tool” suite should to be, instead they can say, “we should switch from Tool X to Tool Y for Project A because if gives use Ben­e­fit C”. Each project expresses it’s pref­er­ences of what scripts are pow­ered by which ver­sion of which libs and which python inter­preter (whew!). Then the only thing remain­ing is for instal­la­tion tool to care of the ugly work of express­ing those pref­er­ences by out­putting the gory details of the hard-coded bits of a script into a project’s /bin/ directory.

    Many lan­guage camps have toyed with the approach of writ­ing out hard-coded scripts that declare library loca­tions up-front, and typ­i­cally they back away from it because this approach is deemed to be “clunky”. But it’s only clunky if you have to deal with the hard-coding man­u­ally! As soon as you check-in hard-coded scripts into ver­sion con­trol, the other devel­op­ers would right­fully berate you (and so instead you put “#!/usr/bin/env python” as a script header, but then you are only push­ing the need to hard-code what python and what libs will be used into a hard-coded shell file …) But instead if you sim­ply express the require­ments of each script (which inter­preter, which libraries, where to call into for the main func­tion), then it becomes pos­si­ble to allow a tool to auto­mate expres­sion of all of the clunky bits and from the developer’s per­spec­tive “clunky” becomes “effort­less”. Fur­ther­more if your def­i­n­i­tion of “clunky” is guided by actual per­for­mance bench­marks in a given deploy­ment, then you can use dif­fer­ent recipes to express script and pack­age lay­out (flat install, egg-install, zipped-only install) until you reach a file lay­out that is opti­mized for a given deployment.

    Another rea­son for tak­ing the appraoach of hav­ing an install tool lay­out the gory details of set­ting up scripts and libs for a project is that you can acco­mo­date more than one lan­guage. Sure, python is far and away my favorite lan­guage, but I appre­ci­ate python the lan­guage just fine with­out appre­ci­ate how the files to sup­port a given imple­men­ta­tion of python are layed out. Imag­ine if you had “Vir­tu­al­RubyEnv” and “Vir­tu­alPer­lEnv” and you work­ing on some mon­grelly project that wanted to com­bine Python, Ruby and Perl into one place (and in an acedemic/bioinformatics set­ting, mon­grelly projects seem to be more com­mon than pure one lan­guage only projects). Which “Vir­tu­al­Lang” would be the mas­ter one? What if one lan­guages notion of where it puts stuff is incom­pat­i­ble with how another lan­guage lays out it’s files?

    So we already have, in one incar­na­tion via Build­out and zc.recipe.egg, osten­si­bly reached some form of Python pack­ag­ing nir­vana, where libraries are con­sumed from a multi-version only-installed-once-each repos­i­tory and sub­tly dif­fer­ent scripts aren’t attempt­ing to over­write each other. You still have the valid crit­i­cisms of Build­out being more dif­fi­cult to approach and learn than it needs to be, and these prac­tices need to be more stan­dard­ized so that any install method is inter­change­able with any other one. And peo­ple need to sim­ply be aware where a prac­tice that is caus­ing them grief *is not* a set-in-stone stan­dard (or any stan­dard at all), but merely the behav­iour of a given tool.

    For the Build­out prob­lem, this can be solved with either bet­ter docs, or sim­ply using a dif­fer­ent install tool but tak­ing the same approach to instal­la­tion (e.g. you could do all this with Paver). It would be be a sign of suc­cess if we could get to a place where a Ruby-centric devel­oper who wanted to use a lit­tle Python in their project could eas­ily man­age Python script gen­er­a­tion and library instal­la­tion from Ruby code.

    For stan­dard­iz­ing things so that more and dif­fer­ent tools can play and inter­op­er­ate well (e.g. if Build­out could cherry-pick some, but not all, OS dis­tro installed libraries). This is what the bulk of most of Tarek’s PEP writ­ing has been push­ing python pack­ag­ing towards: adding install_requires and entry_points to the offi­cial meta­data, writ­ing out proper instal­la­tion meta­data so that you can query a loca­tion and it can tell you what pack­age and ver­sion are installed there, and the men­tioned but as-yet-un-pep’ed way to struc­ture and con­sume a multi-version location.

  • http://jessenoller.com jnoller

    Inter­est­ing thoughts — I’ve been pon­der­ing more and more “hooks” we could tar­get for tarek’s refac­ing of dis­tu­tils to make things like buildout/virtualenv “more of a first class citizen”.

    I *do* like the recipe approach, although I admit that I’m not famil­iar enough with gen­er­at­ing them to “really be keen on them” — also, some part of me wishes all of the recipes were cen­tral­ized in some way to make them men­tally mesh a bit more.

    And I have head­banged on build­out a bit, and bounced off — it’s rel­a­tively alien to my work­flow and per­son­ally I found the very sim­ple pip/virtualenv work­flow dead sim­ple to use (and rapid to learn). But that’s just me.

  • http://jessenoller.com jnoller

    Thanks — my big prob­lem with build­out has been one of sim­plic­ity, and under­stand­ing which recipe does which thing and which one makes sense. Trolling pypi for the “build­out” term results in *cough* quite a few of them.

    That being said, until Tarek can kick­ass and fix up dis­tu­tils to be “on par”-ish with setup­tools, and push out his fork of it, we play in a setuptools-based world, which is unfor­tu­nate in a lot of respects.

  • http://jessenoller.com jnoller

    You can change the name of the direc­tory with the PYTHONUSERBASE envi­ron­ment vari­able; but I def­i­nitely agree with you

  • http://kteague.myopenid.com/ Kevin Teague

    There are 143 at the moment, to be precise:

    http://pypi.python.org/pypi?:action=browse&show…

    But as a gen­eral pur­pose project setup and instal­la­tion tool, that’s kind of the point — dif­fer­ent recipes to install dif­fer­ent stuff. Although there are lots of recipes that are overly-specific and often a sin­gle more generic recipe is later devel­oped which can be used, and other recipes that are near-duplicates and a “canon­i­cal” one needs to be cho­sen: tem­plat­ing of con­fig files is a good exam­ple here.

    But usu­ally when peo­ple talk about Build­out in the con­text of man­ag­ing Python libraries, they actu­ally mean the zc.recipe.egg recipe, which is the most com­monly used recipe used to install python pack­ages and scripts.

    As for reach­ing under­stand­ing with Build­out, well, I think there is still lots of room for improve­ment in mak­ing a more user friendly “get­ting start­ing with Build­out” doc­u­ment. Learn­ing Build­out still does tend to require more head­bang­ing than should be needed.

  • http://kteague.myopenid.com/ Kevin Teague

    we play in a setuptools-based world, which is unfor­tu­nate in a lot of respects”

    This is only unfor­tu­nate in so much as setup­tools hasn’t seen a lot of main­te­nance of late, and that setup­tools does way too much stuff. Typ­i­cally its behavoiur such as how scripts are gen­er­ated by easy_install which irks peo­ple, or how pack­ages are down­loaded which bother oth­ers, or how pack­age meta­data is dif­fer­ent between setup­tools and dis­tu­tils. But there is noth­ing in the python pack­ag­ing ecosys­tem which says you need to con­sume or use the ugly bits of setup­tools — none of the ugly bits have been immor­tal­ized in a PEP! And the ugly bits of Dis­tu­tils that have been PEP-immortalized will hope­fully be kicked to the curb (PEP 314’s Requires and Pro­vides fields) with­out too much fuss.

    Hope­fully the ‘Dis­trib­ute’ fork can fur­ther clar­ify what peo­ple want to keep and what to seper­ate out from setup­tools. Heck, if some­one came up today with yet-another setup­tools fork, aside from an out­cry of “code dupli­ca­tion”, I think this could be a good thing! It would fur­ther clar­ify for folks what parts of pack­ag­ing are ‘stan­dards’ and ‘inter­changle for­mats’ and what parts are just instances of behav­iour of a given tool and can eas­ily be changed (by switch­ing tools) and only affect user’s of that given tool.

    I also think any truly sat­is­fy­ing solu­tion to library man­age­ment for a devel­oper with rea­son­ably com­plex require­ments won’t involve any site-packages. Obvi­ously site-packages isn’t going away any time soon, it caters really well for the “scripters” use-case where they just want to con­sume 3rd party libs in a more one-off script­ing nature. But PEP 370 and Vir­tualEnv work within the assump­tion and the con­straints of “we already have a Python instal­la­tion with a shared ‘bin’ loca­tion and shared ‘library’ loca­tion” and then place installed files inside those loca­tions. Vir­tualEnv is a total hack — but a beau­ti­ful one! It’s good how vir­tualenv is back­wards com­pat­i­ble with all of the exist­ing stuff — but by it’s nature it doesn’t try and re-think or re-work how the prob­lem is solved. With any shared loca­tion, be it a root-only site-packages, a .local site-packages, or a virtualenv-cloned site-packages, any shared loca­tion is an all-or-nothing loca­tion. With any all-or-nothing shared loca­tion you will always have the poten­tial for con­flicts: installing or updat­ing depen­den­cies for one app may break another already installed app. When attempt­ing to sort out version/dependency issues you can’t choose just some libs from a shared loca­tion, but eas­ily ignore other libs of the wrong ver­sion in that location.

    If you start with­out any notion of using a site-packages, then instead you can do some­thing such as the Build­out approach:

    ~/projects
    /app1
    /buildout.cfg
    /bin
    /yolk
    /app2
    /buildout.cfg
    /bin
    /yolk
    ~/pythonlibs
    /yolk-0.4.1-py2.6.egg
    /yolk-0.4.1-py2.4.egg

    This approach has the ben­e­fits of:

    * Don’t need to have dupli­cate instal­la­tions of the same ver­sion of the same library. Each ver­sion of each library only has to be installed once.

    * Scripts are installed in a project-specific loca­tion, not in a shared space.

    This does mean that if yolk-0.4.1 is used by two dif­fer­ent projects, then Build­out will gen­er­ate two ./[someproj]/bin/yolk script entries. But that’s a case where dupli­ca­tion is a good thing! While you may hap­pen to be work­ing on both projects today, and both projects are using the same libraries and python ver­sion, tomor­row could be a dif­fer­ent story. You might want to update one project up to a cutting-edge dev ver­sion of a library but don’t want to do that right now for the other project. Or maybe one project is being migrated from Python 2.5 to 2.6, but you want to hold the other project back to 2.5. You might want to have a yolk-0.4.1 and yolk-0.5dev installed side-by-side for one project, where ./bin/yolk-stable and ./bin/yolk-dev are expressed. heck, you might even want two scripts that call into yolk-0.4.1, but one ver­sion of the script needs to con­tain an extra line or two of hard-coded python to han­dle a spe­cific use case (./bin/yolk and ./bin/yolk-extra-easy). Scripts are always installed to sup­port a spe­cific appli­ca­tion or project, so always putting the scripts into a project-specific loca­tion makes things a lot sim­pler and I think is the way to go.

    Using a boot­strap in con­junc­tion with Vir­tualEnv to bring a suite of dev tools into any new project is a good idea (e.g. pylint, nose, zest.releaser). But why state, “These are the dev tools I use and want to carry around with me”, when instead you can state, “This project uses these dev tools for sup­port” and “That project uses those dev tools for sup­port”. Each project states it’s own requirement’s and pref­er­ences for what tools (and pos­si­bly ver­sions) it needs. In this way two developer’s who nor­mally pre­fer dif­fer­ent dev tools can eas­ily col­lab­o­rate on a project together. They don’t need to argue over what the “stan­dard dev tool” suite should to be, instead they can say, “we should switch from Tool X to Tool Y for Project A because if gives use Ben­e­fit C”. Each project expresses it’s pref­er­ences of what scripts are pow­ered by which ver­sion of which libs and which python inter­preter (whew!). Then the only thing remain­ing is for instal­la­tion tool to care of the ugly work of express­ing those pref­er­ences by out­putting the gory details of the hard-coded bits of a script into a project’s /bin/ directory.

    Many lan­guage camps have toyed with the approach of writ­ing out hard-coded scripts that declare library loca­tions up-front, and typ­i­cally they back away from it because this approach is deemed to be “clunky”. But it’s only clunky if you have to deal with the hard-coding man­u­ally! As soon as you check-in hard-coded scripts into ver­sion con­trol, the other devel­op­ers would right­fully berate you (and so instead you put “#!/usr/bin/env python” as a script header, but then you are only push­ing the need to hard-code what python and what libs will be used into a hard-coded shell file …) But instead if you sim­ply express the require­ments of each script (which inter­preter, which libraries, where to call into for the main func­tion), then it becomes pos­si­ble to allow a tool to auto­mate expres­sion of all of the clunky bits and from the developer’s per­spec­tive “clunky” becomes “effort­less”. Fur­ther­more if your def­i­n­i­tion of “clunky” is guided by actual per­for­mance bench­marks in a given deploy­ment, then you can use dif­fer­ent recipes to express script and pack­age lay­out (flat install, egg-install, zipped-only install) until you reach a file lay­out that is opti­mized for a given deployment.

    Another rea­son for tak­ing the appraoach of hav­ing an install tool lay­out the gory details of set­ting up scripts and libs for a project is that you can acco­mo­date more than one lan­guage. Sure, python is far and away my favorite lan­guage, but I appre­ci­ate python the lan­guage just fine with­out appre­ci­ate how the files to sup­port a given imple­men­ta­tion of python are layed out. Imag­ine if you had “Vir­tu­al­RubyEnv” and “Vir­tu­alPer­lEnv” and you work­ing on some mon­grelly project that wanted to com­bine Python, Ruby and Perl into one place (and in an acedemic/bioinformatics set­ting, mon­grelly projects seem to be more com­mon than pure one lan­guage only projects). Which “Vir­tu­al­Lang” would be the mas­ter one? What if one lan­guages notion of where it puts stuff is incom­pat­i­ble with how another lan­guage lays out it’s files?

    So we already have, in one incar­na­tion via Build­out and zc.recipe.egg, osten­si­bly reached some form of Python pack­ag­ing nir­vana, where libraries are con­sumed from a multi-version only-installed-once-each repos­i­tory and sub­tly dif­fer­ent scripts aren’t attempt­ing to over­write each other. You still have the valid crit­i­cisms of Build­out being more dif­fi­cult to approach and learn than it needs to be, and these prac­tices need to be more stan­dard­ized so that any install method is inter­change­able with any other one. And peo­ple need to sim­ply be aware where a prac­tice that is caus­ing them grief *is not* a set-in-stone stan­dard (or any stan­dard at all), but merely the behav­iour of a given tool.

    For the Build­out prob­lem, this can be solved with either bet­ter docs, or sim­ply using a dif­fer­ent install tool but tak­ing the same approach to instal­la­tion (e.g. you could do all this with Paver). It would be be a sign of suc­cess if we could get to a place where a Ruby-centric devel­oper who wanted to use a lit­tle Python in their project could eas­ily man­age Python script gen­er­a­tion and library instal­la­tion from Ruby code.

    For stan­dard­iz­ing things so that more and dif­fer­ent tools can play and inter­op­er­ate well (e.g. if Build­out could cherry-pick some, but not all, OS dis­tro installed libraries). This is what the bulk of most of Tarek’s PEP writ­ing has been push­ing python pack­ag­ing towards: adding install_requires and entry_points to the offi­cial meta­data, writ­ing out proper instal­la­tion meta­data so that you can query a loca­tion and it can tell you what pack­age and ver­sion are installed there, and the men­tioned but as-yet-un-pep’ed way to struc­ture and con­sume a multi-version location.

  • http://jessenoller.com jnoller

    Inter­est­ing thoughts — I’ve been pon­der­ing more and more “hooks” we could tar­get for tarek’s refac­ing of dis­tu­tils to make things like buildout/virtualenv “more of a first class citizen”.

    I *do* like the recipe approach, although I admit that I’m not famil­iar enough with gen­er­at­ing them to “really be keen on them” — also, some part of me wishes all of the recipes were cen­tral­ized in some way to make them men­tally mesh a bit more.

    And I have head­banged on build­out a bit, and bounced off — it’s rel­a­tively alien to my work­flow and per­son­ally I found the very sim­ple pip/virtualenv work­flow dead sim­ple to use (and rapid to learn). But that’s just me.

What's this?

You are currently reading PEP 370 — Per user site-packages, and environment stew at jessenoller.com.

meta