Multiprocessing in hindsight.

January 28th, 2009 § 17 comments

Steve Holden shot an email out to python dev last night. In it, he said some­thing which at first tweaked me, but in think­ing about it, I both see, and some part of me agrees with his point. To quote:

I think that both 3.0 and 2.6 were rushed releases. 2.6 showed it in the
inclu­sion (later rec­og­niz­able as some­what ill-advised so late in the
day) of mul­ti­pro­cess­ing; 3.0 shows it in the very fact that this
dis­cus­sion has become nec­es­sary. So we face an impor­tant turn­ing point:
is 3.1 going to be seri­ous pro­duc­tion qual­ity or not?


I sent a reply out to python-list just a few min­utes ago (actu­ally, two of them) — the first is this, which I’ll quote verbatim:

I might write a longer blog post about this later, but I can see
Steve’s point of view. The fact is, pyprocessing/multiprocessing was a
late addi­tion to Python 2.6. Per­son­ally, I was game to put it into
either 2.7 or 2.6, but I felt inclu­sion into 2.6 wasn’t com­pletely out
of ques­tion — and oth­ers agreed with me.

See these mail threads:

http://mail.python.org/pipermail/python-dev/2008-May/079417.html

http://mail.python.org/pipermail/python-dev/2008-June/080011.html

And so on.

All of that being said; the ini­tial con­ver­sion and merg­ing of the code
into core exposed a lot of bugs I and oth­ers didn’t real­ize were there
in the first place. I take full respon­si­bil­ity for that — how­ever some
of those bugs were in python-core itself (dead­lock after fork
anyone?).

So, the road to inclu­sion was a bit rougher than I ini­tially thought -
I relied heav­ily on the skills of peo­ple who had more expe­ri­ence in
the core than I did, and it was dis­rup­tive to the release sched­ule of
python 2.6 due to both the bugs and instability.

I how­ever; dis­agree that this was ulti­mately a bad deci­sion, or that
it was some how indica­tive of a poorly man­aged or rushed 2.6 release.
All releases have bugs, and towards the end of the 2.6 cycle,
mul­ti­pro­cess­ing *was not* the release blocker.

After 2.6 went out, I had a small wave of bugs filed against
mul­ti­pro­cess­ing that I’ve been work­ing through bit by bit (I still
need to work on BSD/Solaris issues) and some of the bugs have exposed
issues I sim­ply wish weren’t there but I think this is true of any
pack­age, espe­cially one as com­plex as mul­ti­pro­cess­ing is.

I know of plenty of peo­ple using the pack­age now, and I know of
sev­eral groups switch­ing to 2.6 as quickly as pos­si­ble due to its new
fea­tures, bug fixes/etc. Mul­ti­pro­cess­ing as a pack­age is not bug free
– I’m the first to admit that — how­ever it is use­ful, and being used
and frankly, I main­tain that it is just one step in a larger project
to bring addi­tional con­cur­rency and dis­trib­uted “stuff” into
python-core over time.

So yes, I see Steve’s point — mul­ti­pro­cess­ing *was* dis­rup­tive, and it
inclu­sion late in the game siphoned off resources that could have been
used else­where. Again, I’ll take the respon­si­bil­ity for soil­ing the
pool this way. I do how­ever think, that python 2.6 is over­all a
*fan­tas­tic* release both fea­ture wise, qual­ity wise and is quite
use­ful for peo­ple who want to “get things done” ™.

Now I’m going to go back to fix­ing bugs.

–jesse

And the sec­ond one (which is a reply to a fol­lowup by steve) is here.

I thought I might expound on this just a lit­tle bit — the inclu­sion of mul­ti­pro­cess­ing was my first real expo­sure to any­thing other than being a read-only descrip­tor to python-core. In my excite­ment to get mul­ti­pro­cess­ing in, hav­ing used it quite a bit and dis­cussed with other peo­ple it’s ben­e­fits, I pushed putting it into 2.6 pretty strongly.

The fact is, that tak­ing a mod­ule from 3rd-partyness, and migrat­ing it into the python-core ecosys­tem, espe­cially some­thing with mul­ti­ple plat­form nuances/a lot of C code and API cleanup/reworking/etc, is a big task.

If we had done a straight port from pypro­cess­ing -> mul­ti­pro­cess­ing, it would have been sim­pler, but we took the chance to alter the APIs to be more pythonic and PEP 8 com­pli­ant, as well as refac­tor­ing to take advan­tage of python 2.6/3.0-isms, rewrote a good chunk of the docs/etc.

Doing this much work late in a release cycle is not cool, espe­cially when I (the guy on the hook) is work­ing on it sub-part time, and rely­ing heav­ily on peo­ple in python-core for help.

The amount of help from peo­ple like Ben­jamin, Antoine, Nick, Amaury, and Mark D., Adam, was sim­ply awe­some, and really pulled my ass out of the fire — noth­ing quite like a noob to dis­rupt things, eh?

In hind­sight (oh how I wish I had you sooner), tar­get­ing 2.7 would prob­a­bly have been bet­ter, but I and oth­ers didn’t want to delay inclu­sion for +1 year given the fact that con­cur­rency is such a hot-button issue.

Did I under­es­ti­mate the amount of work? Yes. Were there/are there bugs? Yes. Do I con­tinue to rely on guid­ance and help from peo­ple with big­ger brains than I have? Yes.

Addi­tion­ally, I should point out: I did not write pypro­cess­ing. Richard Oud­kerk did, and he did an excel­lent job in doing so. But some­thing peo­ple may not real­ize, is that shortly after he and I went down this path, and we started doing ini­tial code drops into core, Richard ran into con­nec­tiv­ity prob­lems and, well — fell off the inter­nets. I haven’t heard from him in some time.

So, in all of this, I lost the one per­son who knew the code base best of all — but we trucked through, and as I said before, with the help of oth­ers, we got it in, we got it sta­ble, and we got 2.6 out.

I con­tinue to fix bugs, and I con­tinue to rely heav­ily on community-provided patches (this is my part-part time gig after all) — espe­cially when it comes to oper­at­ing sys­tems which *cough* are “inter­est­ing”. I have a wish­list of enhance­ments to make, and a pile of things I still need to learn.

Over time, I would like to factor-out some of the more “mag­i­cal” things the mod­ule does, and com­pletely rewrite the test suite. Did you know that mul­ti­pro­cess­ing alone has 124 dif­fer­ent test? I’d like to dou­ble that by the end of the year. I’d also like to revamp the doc­u­men­ta­tion, which isn’t as newbie-friendly as it should be.

From a test engi­neer stand­point: I did the one thing I know in my heart should never be done — push a saint bernard of a fea­ture through the cat door of a release. You should never increase your risk by adopt­ing a large fea­ture late in a date dri­ven release. Ever. Doing so dras­ti­cally increases risk, insta­bil­ity and siphons off resources you need else­where. I rant and rail about this at jobs all the time; yet I ignored it in this case. Shame on me.

In clos­ing; can you use it? Yes. I know of plenty of peo­ple who do. Are there bugs? Yes. If you find them, file them. I am proud of where it is today, but I am also not naive enough to think it’s done.

  • http://holdenweb.blogspot.com/ Steve

    Just to be com­pletely open about this, I would add for the record that over­all I think the avail­abil­ity of mul­ti­pro­cess­ing is a great thing for the Python stan­dard library. I even wrote a “Python Mag­a­zine” arti­cle say­ing it would be great to have it in 2.6, so I am also cul­pa­ble here. I think Jesse received poor advice on the amount of work that would be involved. And I know he is work­ing hard to rem­edy the (not huge) defi­cien­cies as we speak.

  • jnoller

    Thanks, and don’t worry about it too — it’s bet­ter to learn from our mis­takes than to dwell on them. I just wanted to fill out the entire process/what hap­pened a bit more — espe­cially the part about loos­ing the orig­i­nal main­tainer halfway/three quar­ters of the way through

  • rerb

    Remem­ber the third key to hap­pi­ness; don’t take any­thing personally.

    Not about you.

  • jnoller

    I’m not; but I do have some amount of per­sonal respon­si­bil­ity. We learn from our mis­takes, and if my trip ups and expe­ri­ence can help oth­ers, then that’s cool

  • http://jessenoller.com jnoller

    See, I knew there was at least one other per­son using it *wink*

  • http://www.personal.psu.edu/iua1/ Ist­van Albert

    For my work the mul­ti­pro­cess­ing mod­ule is the most use­ful fea­ture of python 2.6 There are many other neat improve­ments there of course.

    Thanks for push­ing it through and to all peo­ple involved, it’s a great mod­ule to have.

  • Nick Efford

    Thanks for post­ing such thought­ful and hon­est reflection.

    I think you’ve done the Python com­mu­nity a great ser­vice by work­ing so hard on this very desir­able fea­ture. Hav­ing it there is a win even if it doesn’t reach the desired qual­ity level until 2.6.1 or 2.6.2.

  • http://www.drbrett.ca/ Brett C.

    Yes it prob­a­bly would have been eas­ier had we waited, but there is always some snafu in a release. Con­sid­er­ing how much slack CPython gets over the GIL I appre­ci­ate the mod­ule as the GIL nay-sayers seem to have shut up for once.

    And 124 tests? Importlib has 226! Come on, you can do bet­ter than that! =) But obvi­ously the real ques­tion is what kind of code cov­er­age you get. Which reminds me, I still think we need to get reg­u­lar test cov­er­age results for trunk and py3k so every­one can know where we come up short in our tests.

  • jnoller

    Nah, the GIL naysay­ers are still there — trust me. I try to avoid the bike shed burn­ing and repaint­ing as much as possible.

    As for the tests — way to RAIN ON MY PARADE :) We do need cov­er­age results, I still don’t have accu­rate num­bers on the MP code. Not to men­tion, the cur­rent test suite might as well be called “god damn magic” some­times. Needs less magic.

  • jnoller

    Per­son­ally? I think it’s ready now for many appli­ca­tions, but yes — there’s a hand­ful of bug fixes I need to get in for the next 2.6 rel and 3.xxx

  • Larry Hast­ings

    mul­ti­pro­cess­ing : Python 2.6 :: STL : 1994 ANSI C++ Draft Standard

  • http://www.seunosewa.com/ Seun Osewa

    I also want to say the same thing. If a mis­take was made, it was that it wasn’t accepted *ear­lier*. Not includ­ing it at all would have been a worse outcome.

  • Brian 2

    Python badly needed a stan­dard­ized way to use mul­ti­ple proces­sors. The offi­cial response to com­plaints about the GIL was “threads are bad, use processes”, which is fine, but there was no con­ve­nient way to actu­ally do that. Get­ting mul­ti­pro­cess­ing into 2.6 and 3.0 even with known issues is a much bet­ter result than hav­ing no mul­ti­core solu­tion at all until 2.7/3.1. I’ll def­i­nitely be tak­ing advan­tage of it, and thank you very much for your efforts.

  • http://www.drbrett.ca/ Brett

    So, I screwed up and importlib only has 127 tests; my test auto-discovery code was exe­cut­ing both the source code as well as the byte­code files. Oops. =)

    And I don’t know if your “god damn magic” com­ment is lim­ited to MP or Python’s entire test suite, but I def­i­nitely want to remove the magic from Python’s own test suite. I think that will be my next big project after importlib, DVCS, and the dev docs.

  • jnoller

    I was lim­it­ing my “damn magic” com­ment to MPs test suite. Way too much magic.

  • http://eli.thegreenplace.net Eli

    As other com­menters have said, mul­ti­pro­cess­ing is one of the great­est new fea­tures in 2.6 — it’s the main rea­son I’m itch­ing so hard to switch to 2.6 (though unfor­tu­nately I can’t, until some libs I use do).

    And hav­ing bugs is not a sin. Espe­cially for 2.6, which is a ver­sion most production-heavy users haven’t switched to yet, so it’s a good test­ing ground.

  • http://eli.thegreenplace.net Eli

    As other com­menters have said, mul­ti­pro­cess­ing is one of the great­est new fea­tures in 2.6 — it’s the main rea­son I’m itch­ing so hard to switch to 2.6 (though unfor­tu­nately I can’t, until some libs I use do).

    And hav­ing bugs is not a sin. Espe­cially for 2.6, which is a ver­sion most production-heavy users haven’t switched to yet, so it’s a good test­ing ground.

What's this?

You are currently reading Multiprocessing in hindsight. at jessenoller.com.

meta