| Subcribe via RSS

Cleaning out the inbox: Updated Import pseudo code, filesystems and workingenv

June 25th, 2007 Posted in Programming, Python

In between bugs(bees) and pre-natal exams and other "bodies of agitated fun" I figured I'd continue the cleaning out of the inbox.

A little while ago, Brett Cannon wrote up the pseudo code version of python's import system for his importlib project. He updated it on 2007-06-12 to include how modules are searched for, which, for me, is fantastic given the various ways I have to *cough* abuse the import system sometimes1.

Of special note (for me) is this:

# If the module is already cached in sys.modules then move along.
if name in sys.modules:
continue

I sometimes wish that A> I had robot eyes, and B> Python's import system would take the last module in a path as the module of that name to use. Mind you, this is a small wish and easily forgotten. Also, reading the rest of it I also (as others do) wish that Python did not have to compile .pyc files in the same directory as the .py files. This is all sorts of bad for security (as a coworker recently admonished me about). I really wish I could do something like:

export PYTHON_PYC_CACHE=/tmp/._pythonCache
... dance all night

Or something along those lines. I see a future coming closer where layered filesystems2 (users/runtime space hard separated from binary/os space) and some level of virtualization will become the status quo rather than the exception.

(Note: Know what would be crazy? Using a no-bullshit object-store style filesystem with layering and having the python byte code put in the right place. But that's just because I think it would be cool, the object-store part is optional, and just because combining things is cool)

Now, something people might forget, your current working directory (i.e: the directory of the script) is always position 0 on sys.path:

woot:~/subversion jesse$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', ...wall of text

This of course means that if you have an OS module in your $CWD and one in the stdlib, due to the import system's method of first-in-is-cached/used, this means your local os.py module overrides the builtin os module. So dutifully call it something other than that, or invest in a namespace. For example, a demo of the sitecustomize.py I use when working on a side project:

 
#!/usr/bin/python
import os
try:
    import site
except: pass
 
if 'MYPACKAGE' in os.environ:
    site.addsitedir(os.environ['MYPACKAGE'])

Generally speaking, MYPACKAGE resolves to /Users/jesse/projects/python/some_package/ which has a magical __init__.py sitting in it. Or, if I am feeling particularly hazardous, I use Ian Bicking's wonderful workingenv.py script. To quote:

This tool creates an environment that is isolated from the rest of the Python installation, eliminating site-packages and any other source of modules, so that only the modules (and versions) you install into the environment will be available. This allows for isolated and controlled environments, as well as reproduceability. This is similar to virtual-python, but without the symlinks and with some additional features.

With that said, I'm going back to operation: "Wait for Baby".

  1. see PyMOTW: os (Part 2) - and popen2 isn’t thread safe []
  2. see: Mike Fletcher's "I want an auto-journalling overlay file-system", "Bash in Bitfrost" and "Unioning File Systems for Fun and Profit" []

7 Responses to “Cleaning out the inbox: Updated Import pseudo code, filesystems and workingenv”

  1. Brett Says:

    So do you mean for B) that you want to search through all of sys.path, not where all of the matching files are, and then select the last one? Otherwise I don’t quite follow what you are after.

    As for the .pyc redirection, that might be doable as an importer/loader (and the rough semantics are already specified in PEP 304). The trick is making sure that package import is handled properly. A package might be found at /some/path/, and then you import from /some/path/subpkg/. But now you need to map that to /tmp._pythonCache/subpkg/ somehow.

    Guessing off the top of my head you could just have the importer know it is working with a package, denote the root location where the package is rooted off of sys.path, and then handle the .pyc location as needed.

    But now that I think about it, I am not sure how you would be able to signify that a path is a package cleanly. When you import a package its __path__ entries end up in sys.path_importer_cache just like any other entry on sys.path. You could find the best suffix on sys.path for the __path__ entry, but what if the path is really different and in no way on sys.path (e.g., tacking on a platform-specific directory on to __path__ and thus has a funky directory location like /some/path/x86/subpkg/)? So handling package properly becomes tricky. I guess you could cheat and have the bytecode write out in a naive fashion into the directory (i.e., note the the absolute file name is pkg.subpkg, somehow know that it is an __init__ and not a submodule, and write it to /tmp/._pythonCache/pkg/subpkg/__init__.pyc).

    I would like to know how a sqlite3-backed bytcode cache would perform.

    Anyway, I have rambled enough on this topic. =) I have way too many importers to write know.


  2. Andrew Dalke Says:

    PEP 304 (http://www.python.org/dev/peps/pep-0304/) “Controlling Generation of Bytecode Files” was meant to control where the .pyc files were stored. I think the conclusion and reason for it being withdrawn was that people didn’t need it, there were other ways to solve the problem, and it was too coarse grained. Eg, see http://mail.python.org/pipermail/python-dev/2005-June/054419.html .

    Ah, comment in version control “Authors withdrew some PEPs by mail on python-dev (Apr 26, 2006).” That’s Martin’s comment in http://mail.python.org/pipermail/python-dev/2006-April/064395.html which says:

    It’s not that the feature is undesirable or the specific
    approach at solving the problem - just nobody is interested to work
    on it. So future contributors shouldn’t get the impression that this
    was discussed and rejected, but that it was discussed and abandoned.


  3. Phillip J. Eby Says:

    “”"your current working directory is always position 0 on sys.path”"”

    No, it isn’t. That only happens when you run a -c command or start the interpreter interactively. When you run a script, *the directory of the script* is what’s in position 0 — which may or may not be the same as the current directory.


  4. jesse Says:

    Good point Phillip - I will correct my post - I meant to say the $CWD of the script is position 0


  5. jessenoller.com - Pep 304 followup Says:

    [...] going to be following up on the comments from Brett and others about python bytecode location for this post as soon as I find sanity again - “Operation Pending Baby 1″ has caused a mild for of insanity to set [...]


  6. jesse Says:

    I am going to look into resurrecting this patch and pep. May take me a little time, but I really think something like this would be very useful.


  7. jesse Says:

    So do you mean for B) that you want to search through all of sys.path, not where all of the matching files are, and then select the last one? Otherwise I don’t quite follow what you are after.

    Yes, that’s what I meant - use the last instance of a name in the sys.path rather than the first. Admittedly, this has a lot of problems associated with it and is easily fixed with a sys.path.insert(0, ‘path’)

    As for the rest: I need to dig my heels into PEP304 - ultimately it could be mental dead-end.

    As for this…

    I would like to know how a sqlite3-backed bytcode cache would perform.

    So would I :)


Leave a Reply