A followup on Concurrency within Python

by jesse in ,


Coming out of PyCon 2008 my head is again filled with thoughts on concurrency based on the talks and work I was involved with at the Con. First up: I want to point out something which bothered me a little. Namely, that many people seemed resigned and pushing towards Jython/IronPython as the "concurrency answer". There was a lot of discussion/buzz about the fact that both of these implementations are the future of concurrency for python as a language due to the obvious lack of a GIL.

In discussing this with many people, I may have come off as defensive if nothing more than for the fact I constantly refuted that as "the answer". Don't get me wrong, I am eager, no, very eager to get my hands on the latest version of Jython to reach in and try out the java.util.concurrent package via Jython, I think access to that library will be great, for those working in a hybrid Java/Python shop.

But I don't think "Python's" future should be tightly coupled with the implementation of a runtime on top of another runtime. I think work has to be performed on the CPython interpreter to make it a viable contender and solution within the "concurrency" space. Python's strength will be found in multiple strong implementations of the interpreter.

Let me be clear: I believe Python programmers want to write Python - not Java, nor .Net. What attracts us to Python is a clean syntax. If I want concurrency in Python, I don't want to have to call into java.util.concurrent. Obviously, the Jython guys are not going to force this - threading will simply use Java threading, but I think a "pythonic" abstraction of the java.util.concurrent package as a whole would also be desired - so you could really use the power of the libraries, but stay wrapped in the warm pastry of Python.

Concurrency and distributed design is a large enough space that the use-cases and problems people want and need to address open the playing field for not just "one" solution (i.e: getting rid of the GIL) but a series of language and standard library improvements to Python as a whole.

My lightning talk touched on this: I think there are existing projects that are well-suited for inclusion into the standard library, and I believe Adam Olsen's work on the safe threading project (update here) will also help pave the way for a much more appealing future for CPython - even if it is in Py3k.

Python has a history of picking the best ideas from other languages and improving on them. For example: Python probably does not want to simply re-create the java.util.concurrent package - the concurrent package views threads as the solution. Python would be well suited to support "real threads" in as simple and an abstracted way as possible (a leaky abstraction at that).

Erlang has the Actor/Asynchonus Message passing model (a functional language implementation) (for more on Erlang concurrency, see this). While people may be envious of what they see as "perfect side effect free concurrency" - I doubt they envy the syntax or complex nature of it. Still yet - there are existing fork+exec+message passing libraries available for python right now that sidestep the GIL, but allow you to keep basic threading primitives and semantics and parallelize to cores, and clusters.

In discussion all of this with people at PyCon, I really began to realize that many people in the community look in envy to other language's concurrency implementations - including Java's Threads (even though many people still insist threads are impossible to get right).

I also realized that a lot of people are trying to solve a lot of various problems. When people hear distributed, concurrency and parallelism, they immediately bring up large data crunching (ala map reduce and its kin), distributed filesystems (ala GFS and Hadoop) to simple examples of jobs spread across multiple cores for math crunching and jobs passed to entire clusters of hundreds of machines.

Everyone is looking for an answer to their problem - and Python can not implement something that will address your problem. It can only provide the platform with the tools to allow you to solve it yourself.

I suggested the following to multiple people at pycon - this is what I view as the "cleanest" approach to providing the abstractions and tools people will need to move forward with CPython in this space.

  1. Adopt/move forward with a version of CPython based on the work/concepts offered in Adam Olsen's Safe/Without GIL Threading work. Adopt the monitor and deadlock work for all of CPython, but leave the GIL removal as a compile-time option (python3.0 and python3.0-mt binaries).
  2. Add the pyProcessing, pprocess, or Parallel-Python module to the standard library for those who want to use vanilla cpython, and to whom the fork method works. Which module is added should use existing Python semantics, but be powerful enough for advance/distributed usage. Note, I have started work on a PEP to get the pyProcessing module into the stdlib.
  3. Add an Actor - or Actor-Like module to the functools module, in keeping with the concepts of the current implementations. Python's implementation would be obviously leaky. (Note, I am not a functional programmer by any stretch, I like my OOP, suggestions welcome). This could easily take advantage of both the Safe Threading work.
  4. Add a lightweight (not XML) messaging system to couple with the Actor implementation . (no, I have no suggestions here)

Obviously, a varied approach like this doesn't conflict with the "one way to do it right" methodology of Python, in a problem space as varied as this - "one way to do it right" is completely dependent on the problem being solved. It can also help couch the discussion of what the problem they're trying to solve really is.

I've obviously been trying to help with the safe threading work - and I am writing a PEP for the inclusion of pyProcessing into the stdlib, but even this work is not the end - it's simply a start to help give people some batteries to plug into their concurrent toy car.

We should think about the building blocks to help people solve whatever problem that arises - we can't add a solution that fits any one problem, we can only provide the tools with which to build a great big concurrent future.

Now I have to go - my daughter has acquired the lock.

Also, thanks to Adam for helping review this!