Chroot and Python discussion and random pyc thoughts

June 28th, 2007 § 7 comments

Since I’m finally back from another excit­ing edi­tion of almost-labor at the hos­pi­tal and catch­ing up, I thought I would point out a dis­cus­sion on python-dev about chroot jails and python. Inter­est­ing infor­ma­tion and tan­gen­tially related to some of my thoughts on the .pyc loca­tion stuff.

The con­ver­sa­tion is going on here and you can view some other information.

If course, there’s a shout out to Brett’s secu­rity work too.

Here’s is the wiki page ref­er­enced in the thread: How can I run an untrusted Python script safely (i.e. Sandbox)

A lot of the util­i­ties are inter­est­ing, but I’m still inter­ested in the byte-code loca­tion of things.

Some thoughts on my pyc thing:
– One thing to note, is that if the user run­ning the python inter­preter does not have write (+w) access to the direc­tory the imported .py is located in, the .pyc/.pyo file is not writ­ten. A .pyc file is an opti­miza­tion for mod­ule load­ing only.

Since this is the case, is wor­ry­ing about lay­ered filesystems/storing .pyc files in “other” direc­to­ries really that hot of an issue for me? Maybe. I’d still like to see if I can get the where­withal to drive pep 304 for­ward — I’d still like to be able to con­trol where to put things, but if you use com­pileall() and ship the .pyo/.pyc stuff *or* you just make sure the dae­mon that’s invok­ing the inter­preter does not have +w on it’s script/binary direc­tory (which it shouldn’t) you could be ok.

  • http://www.dougma.com/ Doug Napoleone

    Could’ is the oper­a­tive word here. This is a huge issue for grid envi­ron­ments where mul­ti­ple ver­sions of python access the same ‘release’ of python code.

    The net­work traf­fic just to find out that there are no +w per­mis­sions on a direc­tory can be huge for micro-tasks on a grid. If there is +w access (as many times there needs to be for sand­box devel­op­ment), then you have all those writes and cross writes, lock misses, race con­di­tions, etc. This is why SEO (Sony Enter­tain­ment Online) has their own cus­tom python which does not do .pyc or .pyo at all. We do our own cus­tom hacks, but being able to spec­ify the ‘build’ direc­tory at run­time is a fea­ture us grid folks would love.

    As a workaround we have our own spe­cial python com­pile code which ensures the full path to the .py is com­piled into the .pyc, then we move the .pyc/.pyo off to a spe­cial build direc­tory, and run directly from the .pyc/.pyo’s. This means that the excep­tion stacks are cor­rect, but no .pyc .pyo build­ing occured for ‘released’ ver­sions. This can also be done for devel­op­ment, but it means that an extra ‘build’ step is required. There are other exten­sive hacks done with cus­tom import hooks to reduce the python­path search­ing which is also net­work intensive.

    The­o­ret­i­cal exam­ple: 1000 machines, 8 python processes each, just one net­work drive on the python path (yea right), and say only 25 mod­ules. search for .so, .pyd, .pyc, .pyw, .py = 1Million net­work lookups in under a sec­ond. Each net­work oper­a­tion can be actu­ally up to 12 net­work operations/transactions using up over 256 bytes each. Total net­work traf­fic for just FINDING the .py files (not load­ing them) in this mod­est exam­ple would be 256Meg/sec.

    Just look­ing for .py files is 1/4th of your the­o­ret­i­cal max giga­bit back­bone. Yes there are ways around this (some described above), but they are not sim­ple or elegant.

    So yes, this is a ‘hot’ issue for some people :-)

  • Syl­vain

    Nice arti­cle.

  • http://www.jessenoller.com jesse

    I had not even though about the grid ram­i­fi­ca­tions on this — I’m in a dis­trib­uted sys­tem, but the clus­ter is com­prised of indi­vid­ual nodes with no shared back end (and even if it *is* shared, it’s fiber to a SAN).

    Wanna help with dri­ving 304 forward?

    Also, you win for “best com­ment any­where ever” award.

  • http://www.dougma.com/ Doug Napoleone

    Yes I do want to help, but I have some seri­ous back­log I need to resolve first. I don’t want to com­mit to some­thing until I am sure I can keep that com­mit­ment. I will be send­ing you an e-mail with more details soon.

    Thanks for the award, but I don;t think I deserve it :-)

  • Syl­vain

    Dou­glas,

    This is really inter­est­ing. Is there a chance you dis­cuss it more on your blog or some­where pub­lic? I’m look­ing at poten­tial large grids like that in the future and would be def­i­nitely inter­ested in your expe­ri­ence (mod­ule any NDA).

    This is even more inter­est­ing con­sid­er­ing the large amount of projects rely­ing on eggs and setup­tools that pol­lute sys.path with each sin­gle egg direc­tory in the path.

    Thanks for your share anyhow.

  • http://www.jessenoller.com jesse

    This is even more inter­est­ing con­sid­er­ing the large amount of projects rely­ing on eggs and setup­tools that pol­lute sys.path with each sin­gle egg direc­tory in the path.

    Ugh. Don’t remind me about that. Avoid­ing putting/installing things in the main sys­tem library is one of my goals. I’ve started using a sitecustomize.py file that points to /Users/jesse/python/modules and installing every­thing I can there.

  • http://www.dougma.com/ Doug Napoleone

    Syl­vain,

    It’s on my list. I am hop­ing to sub­mit a talk pro­posal on it for PyCon2008, but that is so far in the future as to me never.
    I will add it to my list of things to blog about. I hope to get pyg­ments inte­gra­tion before then. There are some NDA con­cerns, but not too many, as long as I stay away from copy­righted code and actual grid con­fig­u­ra­tions. Some­thing geared towards Amazon’s S3 should work well.

    NOTE: that should read 256bits, not 256bytes above. The math does not work otherwise.

What's this?

You are currently reading Chroot and Python discussion and random pyc thoughts at jessenoller.com.

meta