A followup on Concurrency within Python

March 17th, 2008 § 12 comments

Com­ing out of PyCon 2008 my head is again filled with thoughts on con­cur­rency based on the talks and work I was involved with at the Con.

First up: I want to point out some­thing which both­ered me a lit­tle. Namely, that many peo­ple seemed resigned and push­ing towards Jython/IronPython as the “con­cur­rency answer”. There was a lot of discussion/buzz about the fact that both of these imple­men­ta­tions are the future of con­cur­rency for python as a lan­guage due to the obvi­ous lack of a GIL.

In dis­cussing this with many peo­ple, I may have come off as defen­sive if noth­ing more than for the fact I con­stantly refuted that as “the answer”. Don’t get me wrong, I am eager, no, very eager to get my hands on the lat­est ver­sion of Jython to reach in and try out the java.util.concurrent pack­age via Jython, I think access to that library will be great, for those work­ing in a hybrid Java/Python shop.

But I don’t think “Python’s” future should be tightly cou­pled with the imple­men­ta­tion of a run­time on top of another run­time. I think work has to be per­formed on the CPython inter­preter to make it a viable con­tender and solu­tion within the “con­cur­rency” space. Python’s strength will be found in mul­ti­ple strong imple­men­ta­tions of the interpreter.

Let me be clear: I believe Python pro­gram­mers want to write Python — not Java, nor .Net. What attracts us to Python is a clean syn­tax. If I want con­cur­rency in Python, I don’t want to have to call into java.util.concurrent. Obvi­ously, the Jython guys are not going to force this — thread­ing will sim­ply use Java thread­ing, but I think a “pythonic” abstrac­tion of the java.util.concurrent pack­age as a whole would also be desired — so you could really use the power of the libraries, but stay wrapped in the warm pas­try of Python.

Con­cur­rency and dis­trib­uted design is a large enough space that the use-cases and prob­lems peo­ple want and need to address open the play­ing field for not just “one” solu­tion (i.e: get­ting rid of the GIL) but a series of lan­guage and stan­dard library improve­ments to Python as a whole.

My light­ning talk touched on this: I think there are exist­ing projects that are well-suited for inclu­sion into the stan­dard library, and I believe Adam Olsen’s work on the safe thread­ing project (update here) will also help pave the way for a much more appeal­ing future for CPython — even if it is in Py3k.

Python has a his­tory of pick­ing the best ideas from other lan­guages and improv­ing on them. For exam­ple: Python prob­a­bly does not want to sim­ply re-create the java.util.concurrent pack­age — the con­cur­rent pack­age views threads as the solu­tion. Python would be well suited to sup­port “real threads” in as sim­ple and an abstracted way as pos­si­ble (a leaky abstrac­tion at that).

Erlang has the Actor/Asynchonus Mes­sage pass­ing model (a func­tional lan­guage imple­men­ta­tion) (for more on Erlang con­cur­rency, see this). While peo­ple may be envi­ous of what they see as “per­fect side effect free con­cur­rency” — I doubt they envy the syn­tax or com­plex nature of it. Still yet — there are exist­ing fork+exec+message pass­ing libraries avail­able for python right now that side­step the GIL, but allow you to keep basic thread­ing prim­i­tives and seman­tics and par­al­lelize to cores, and clusters.

In dis­cus­sion all of this with peo­ple at PyCon, I really began to real­ize that many peo­ple in the com­mu­nity look in envy to other language’s con­cur­rency imple­men­ta­tions — includ­ing Java’s Threads (even though many peo­ple still insist threads are impos­si­ble to get right).

I also real­ized that a lot of peo­ple are try­ing to solve a lot of var­i­ous prob­lems. When peo­ple hear dis­trib­uted, con­cur­rency and par­al­lelism, they imme­di­ately bring up large data crunch­ing (ala map reduce and its kin), dis­trib­uted filesys­tems (ala GFS and Hadoop) to sim­ple exam­ples of jobs spread across mul­ti­ple cores for math crunch­ing and jobs passed to entire clus­ters of hun­dreds of machines.

Every­one is look­ing for an answer to their prob­lem — and Python can not imple­ment some­thing that will address your prob­lem. It can only pro­vide the plat­form with the tools to allow you to solve it yourself.

I sug­gested the fol­low­ing to mul­ti­ple peo­ple at pycon — this is what I view as the “clean­est” approach to pro­vid­ing the abstrac­tions and tools peo­ple will need to move for­ward with CPython in this space.

  1. Adopt/move for­ward with a ver­sion of CPython based on the work/concepts offered in Adam Olsen’s Safe/Without GIL Thread­ing work. Adopt the mon­i­tor and dead­lock work for all of CPython, but leave the GIL removal as a compile-time option (python3.0 and python3.0-mt binaries).
  2. Add the pyPro­cess­ing, pprocess, or Parallel-Python mod­ule to the stan­dard library for those who want to use vanilla cpython, and to whom the fork method works. Which mod­ule is added should use exist­ing Python seman­tics, but be pow­er­ful enough for advance/distributed usage. Note, I have started work on a PEP to get the pyPro­cess­ing mod­ule into the stdlib.
  3. Add an Actor — or Actor-Like mod­ule to the func­tools mod­ule, in keep­ing with the con­cepts of the cur­rent imple­men­ta­tions. Python’s imple­men­ta­tion would be obvi­ously leaky. (Note, I am not a func­tional pro­gram­mer by any stretch, I like my OOP, sug­ges­tions wel­come). This could eas­ily take advan­tage of both the Safe Thread­ing work.
  4. Add a light­weight (not XML) mes­sag­ing sys­tem to cou­ple with the Actor imple­men­ta­tion . (no, I have no sug­ges­tions here)

Obvi­ously, a var­ied approach like this doesn’t con­flict with the “one way to do it right” method­ol­ogy of Python, in a prob­lem space as var­ied as this — “one way to do it right” is com­pletely depen­dent on the prob­lem being solved. It can also help couch the dis­cus­sion of what the prob­lem they’re try­ing to solve really is.

I’ve obvi­ously been try­ing to help with the safe thread­ing work — and I am writ­ing a PEP for the inclu­sion of pyPro­cess­ing into the stdlib, but even this work is not the end — it’s sim­ply a start to help give peo­ple some bat­ter­ies to plug into their con­cur­rent toy car.

We should think about the build­ing blocks to help peo­ple solve what­ever prob­lem that arises — we can’t add a solu­tion that fits any one prob­lem, we can only pro­vide the tools with which to build a great big con­cur­rent future.

Now I have to go — my daugh­ter has acquired the lock.

Also, thanks to Adam for help­ing review this!

  • masklinn

    > I doubt they envy the syn­tax or com­plex nature of it

    What syn­tax wouldn’t there be to envy, and which part of Erlang’s con­cur­rency model is complex?

    First point: Erlang’s con­cur­rency requires 2 prim­i­tives: a “send mes­sage to process” and a “retrieve mes­sage from mail­box”. The first one is straight­for­ward and triv­ial, the sec­ond one is a bit more com­plex (it needs the abil­ity to have time­outs, and while pat­tern match­ing makes filtering/fetching mes­sages extremely straight­for­ward it may be more com­plex to han­dle that with­out PM). The send/receive syn­tax is one of the best parts of erlang’s syn­tax: straight­for­ward, short and to the point. What isn’t there to envy? Yes we could add ‘spawn‘, but that’s not spe­cific to erlang-style concurrency.

    Sec­ond point: the com­plex­ity. In all of my exper­i­ments, Erlang-style con­cur­rency was infi­nitely sim­pler than e.g. java-style con­cur­rency, even with java.util.concurrent (1.5’s, we haven’t switched to 1.6 yet) for var­i­ous reasons:

    * A sin­gle con­cur­rency struc­ture (which can be built upon to e.g. super­vi­sor tree, but at its core Erlang only has “send a mes­sage, receive a message”)

    * Uncon­strained resources, there is no need to bother with thread-pooling and other tripe/implementation details in Erlang. If you need a process, just spawn it and let the run­time take care of its allo­ca­tion and scheduling

    * Synchronous/sequential erlang and asynchronous/concurrent erlang are clearly sep­a­rated with func­tion calls on one side and mes­sages on the other one, mes­sage sends don’t look like func­tion calls and reciprocally.

    * Finally, erlang has the tools to han­dle mas­sive con­cur­rency, in order to intro­spect and debug a sys­tem that can have thou­sands of run­ning processes.

    So I’d like to know more about what you con­sider to be the syn­tac­ti­cal or com­plex­ity prob­lems of shared-nothing message-passing con­cur­rency *com­pared to shared-memory “java-style”* concurrency.

  • qwerty

    Although pypro­cess­ing isn’t the end-all solu­tion, it’s syn­tax is really easy to pick up. I really hope you man­age to get it into the stan­dard library.

  • MichaelSparks

    Hi there,

    You may find Kamaelia inter­est­ing — I’ve been adding clean mul­ti­core sup­port recently. You can find an overview of its design phi­los­o­phy on our sum­mer of code page here: http://kamaelia.sourceforge.net/SummerOfCode2008 (link­ing there because there’s an embed­ded pre­sen­ta­tion & lots of links).

    You can find dis­cus­sion of how mul­ti­core sup­port was added here:
    http://yeoldeclue.com/cgi-bin/blog/blog.cgi?rm=
    Rather than pypro­cess­ing it uses pprocess since that seems suf­fi­cient. I’ll take a look at pypro­cess­ing though.

    Sin­gle process pipeline:
    Pipeline(
    Textbox(position=(20, 340),
    text_height=36,
    screen_width=900,
    screen_height=400,
    background_color=(130,0,70),
    text_color=(255,255,255)),
    TextDisplayer(position=(20, 90),
    text_height=36,
    screen_width=400,
    screen_height=540,
    background_color=(130,0,70),
    text_color=(255,255,255))
    )

    Mul­ti­process pipeline:
    ProcessPipeline(
    Textbox(position=(20, 340),
    text_height=36,
    screen_width=900,
    screen_height=400,
    background_color=(130,0,70),
    text_color=(255,255,255)),
    TextDisplayer(position=(20, 90),
    text_height=36,
    screen_width=400,
    screen_height=540,
    background_color=(130,0,70),
    text_color=(255,255,255))
    )

    It just works. It seems to have sim­i­lar inspi­ra­tion as erlang, but I wasn’t aware of erlangs exe­cu­tion model when I cre­ated Kamaelia. (It was more inspired by asyn­chro­nous hard­ware sys­tems, unix pipelines, CSP and bio­log­i­cal systems).

    cf http://kamaelia.sourceforge.net/Introduction

    I went into that (due to work mod­el­ling bio­log­i­cal sys­tems) with the assump­tion that whilst the pri­mary com­mu­ni­ca­tion would be akin to the ner­vous sys­tem (send and receive of pulses of infor­ma­tion between proces­sors) hence why the core of kamaelia is called Axon, that there would need to be an equiv­a­lent for a high latency, low use, but use­ful hor­monal sys­tem. That boils down to a global key value store. (which will be migrat­ing soon to an STM based store).

    The inter­est­ing thing from my per­spec­tive is that this also mir­rors a sys­tem from RAF Malvern from 30 years ago, called MASCOT which I heard about at christ­mas. That uses chan­nels, instead of named inboxes/outboxes, and pools which are equiv­a­lent to our CAT (“hor­monal” sys­tems). MASCOT is utterly fas­ci­nat­ing because they started from the same premise. I only recently heard about MASCOT (just before christ­mas), and the only ref­er­ence I can find online is this, but if you scroll down, you’ll find “The Offi­cial Hand­book of Mas­cot : Ver­sion 3.1 : June 1987″. Per­haps scar­ily, their first ver­sion appears to have been in 1975.…)

    http://async.org.uk/Hugo.Simpson/

    OK, I’ll get back to sug­gest­ing ideas for stu­dents for GSOC which are nat­u­rally highly con­cur­rent, fun, in python begin­ner friendly and nat­u­rally multicore.

    Inci­den­tally, there is also another dif­fer­ence between python & con­cur­rency in a func­tional lan­guage — the fact that objects in python are muta­ble. I blogged about some impli­ca­tions of this here:
    http://yeoldeclue.com/cgi-bin/blog/blog.cgi?rm=

    Kamaelia has proof of con­cept imple­men­ta­tions in Java, C++ and ruby as well.

    Have fun :-)

    Michael.

  • http://ram.umd.edu Joseph Lisee

    Thank you for speak­ing sense about con­cur­rency in Python. The project hosted at the web­site listed with this post would ben­e­fit in a large way from a CPython with­out a GIL. I am glad some­one else shares the view that python should give you tools to build your appli­ca­tion, your way, not dic­tate its design.

    The code for the robot­ics sys­tem above uses more C++ code then needed because it needs true con­cur­rency. A python with­out a GIL would let us coders ben­e­fit from the reduc­tion in code size you get when you go from C++ to python.

    –Joseph Lisee

  • jnoller

    It’s not the imple­men­ta­tion of Erlang that I’m cri­tiquing — I’m very much pro-shared-little/nothing, ergo my lik­ing of Adam Olsen’s mon­i­tor work in the safethread­ing project. I have a bone to pick with erlang’s syn­tax itself.

    I agree with you on the power of Erlang’s model — how­ever I think the barrier-to-entry of Erlang is why it con­tin­ues to largely be a niche lan­guage. If Python were to adopt some of the the­ory behind what Erlang is done, I would be happy.

  • jnoller

    So do I — and I agree, it’s not the final solu­tion, it’s only a library to help peo­ple get “a” (not The) job done.

  • jnoller

    The biggest thing to remem­ber about the GIL is that is it not a boogey­man, but rather an imple­men­ta­tion detail of the inter­preter. It’s design is such that it keeps the C imple­men­ta­tion sim­ple espe­cially for exten­sion writers.

    Remov­ing the GIL, if done poorly, will grossly increase the com­plex­ity of the inter­preter, slow C devel­op­ment down and make exten­sion writ­ing a nightmare.

    It’s impor­tant to real­ize the ben­e­fits it does provide.

  • jnoller

    Kamaelia *does* look inter­est­ing. I am going to have to check it out.

  • http://jessenoller.com jnoller

    It’s not the imple­men­ta­tion of Erlang that I’m cri­tiquing — I’m very much pro-shared-little/nothing, ergo my lik­ing of Adam Olsen’s mon­i­tor work in the safethread­ing project. I have a bone to pick with erlang’s syn­tax itself.

    I agree with you on the power of Erlang’s model — how­ever I think the barrier-to-entry of Erlang is why it con­tin­ues to largely be a niche lan­guage. If Python were to adopt some of the the­ory behind what Erlang is done, I would be happy.

  • http://jessenoller.com jnoller

    So do I — and I agree, it’s not the final solu­tion, it’s only a library to help peo­ple get “a” (not The) job done.

  • http://jessenoller.com jnoller

    The biggest thing to remem­ber about the GIL is that is it not a boogey­man, but rather an imple­men­ta­tion detail of the inter­preter. It’s design is such that it keeps the C imple­men­ta­tion sim­ple espe­cially for exten­sion writers.

    Remov­ing the GIL, if done poorly, will grossly increase the com­plex­ity of the inter­preter, slow C devel­op­ment down and make exten­sion writ­ing a nightmare.

    It’s impor­tant to real­ize the ben­e­fits it does provide.

  • http://jessenoller.com jnoller

    Kamaelia *does* look inter­est­ing. I am going to have to check it out.

What's this?

You are currently reading A followup on Concurrency within Python at jessenoller.com.

meta