What do you want to see in a concurrency talk…

So, for starters, I’ll be doing a 1 hour talk at the PyWorks conference in Atlanta on November 13.

Following that, I am working on a PyCon tutorial, and one other non-tutorial talk for PyCon. My current theme for this round of talks is “Python 2.6, threading and multiprocessing concurrency”.

That’s a mouthful.

The PyWorks talk is entitled “Getting Started with Concurrency with MultiProcessing and Threads” – given it’s a one hour talk, I’m probably going to need to only go into threads on a cursory level (to give everyone a common understanding) and then delve into multiprocessing features and touch on a basic application as an anchor.

I plan on covering the differences, pluses, minuses and “getting started” and walking through some amount of the API. Given it’s only an hour, I won’t be able to deep-dive into building a fully fledged application. I can only hope to give people enough information to get started.

Moving onto pycon, I wanted to expand on this space for a full-blown tutorial – I wanted to cover threads, multiprocessing, pros, cons, organization, APIs, best practices, the GIL and walk through building actual application(s) – although I need to pick a good “showcase” application that people can grok.

Also, in the tutorial, I was thinking about going from local concurrency to network-based concurrency (starting with the built in multiprocessing API) so that people can understand the differences and take those into account.

The final one – the short talk for PyCon is going to be a “Python 2.6 threading changes and intro the mulitprocessing”

In the final talk, given it’s short – it’s going to be an overview of the changes/new module and a walk through (I hope) of a basic application.

My question to you, oh interwebs, is what might you like to see/be able to get from talks in these veins? I can’t go into all of the whys and hows and whens, but I can arm people with enough information to get up and running.

I desperately want to get the “most bang for the buck” in these talks, obviously, due to the nature of slides, except for the tutorial, I won’t be able to show hundreds of lines of code to illustrate everything.

For an example application – I need to make something which will “scale up” – from thread-based approaches all the way to distributed-over-the-network approaches.

  • my biggest hope for the multiprocessor library is that it will allow me to continue working in python rather than switch to Erlang. Erlang has a great showcase in couchDB, the FOSS project most like BigTable. Why don't you start a couchDB clone built in python using the new libraries? Or if a partially completed project won't do, please show a map-reduce example (eg. distributed grep)
    -Thanks!
  • Jesse - please take a look at my Kamaelia talks on slideshare ...

    http://www.slideshare.net/kamaelian/slideshows

    ... they're (literally) littered with numerous example systems which are highly concurrent, written in python and tested. Kamaelia is designed specifically with novices in mind, and the following 2 talks were given at pycon UK last weekend:

    http://www.slideshare.net/kamaelian/practical-c...

    http://www.slideshare.net/kamaelian/sharing-dat...

    The latter one needs a bit of work based on feedback in the session, but the talks seemed highly accessible. (Indeed, kamaelia has been designed with accessibility in mind as a key design feature - it's also the driving long term reason behind Kamaelia's existence :-)

    Kamaelia is primarily a message passing based system (generators, threads, processes as units of concurrency), but also includes a software transactional memory implementation for the times when you really do need to share data. (kamaelia's real underlying design is "no unconstrained data" and "if you have a piece of data and haven't given it back/to anyone else, your changes should be safe".

    We're actually in the process of a revamp of the website.

    If you want a selection of applications to talk about - there's examples using network servers, interactive pygame based applications, digital TV/PVR systems, offline batch transcoders, to name a few... I've noticed a tendency for people to relate to very concrete ideas - which is why this time round I chose a simple and direct game as the initial example.

    The key thing which is perhaps different is that the focus in kamaelia is on making communicating systems useful through a component metaphor, rather than on "lets use concurrency for concurrency sake". ie use a pattern for software construction that happens to (deliberately :-) drop out as naturally highly concurrent.

    Perhaps my most fun hack (10 hours or so) was the Speak And Write app - which uses handwriting recognition and speech synthesis to teach a child to read and write.
    http://edit.kamaelia.org/SpeakAndWrite

    Regarding the map-reduce idea - I'd personally love to see such a beast written using Kamaelia and would also be more than happy to see it merged onto /trunk ...

    Perhaps the only downside of Kamaelia vs your talk is the choice of multiprocessing module in python 2.6 doesn't match the one we've been using in Kamaelia. (at some point we'll flip when we've got a chance - it'll make little difference to user code beyond being 1 less thing to install :)

    Anyway, you did ask "what do you want to see ..." :-)

    What I would hope is that you teach people the one rule that causes all problems in concurrent systems though - unconstrained shared data. If you make sure you don't have that, life becomes easy, fun and concurrent. If you do have any though, life often becomes very hard, and painful to debug.

    Anyway, bets of luck for you talk - I hope it goes well :-)
  • I've looked at Kamelia before - it definitely looks like another "swiss army knife" framework - ala twisted, etc. I'm going to mention all of the frameworks (kamelia included) but I want to focus on the more primitive aspects of concurrency (multicore, messages) and how to build something using standard python.

    Of course, I need a good network layer, so maybe I'll poke at Kamelia's :)
  • OK, cool. Let's see.

    Tell you what - how about I just package up the network components (no protocols, no pygame, no open gl, etc), along with a minimal set of other useful components as well by itself for you? ie the absolute minimum useful (by no smaller) for building TCP & UDP (inc multicast) based servers & clients ?

    ie something that will just install quickly and cleanly by itself, with no external dependencies?

    That should be pretty tiny and still useful.

    Most of Kamaelia's components have no dependency on each other, so this would be fairly easy and quick to do. (It's perhaps better to think of Kamaelia as a toy box or tool chest or radio shack which you can just take the things out you want, rather than as a swiss army knife IMO) All the components depend on Axon, but then that's not too suprising - Axon is the component framework. But even that is small enough to be comprehensible.

    Heck, I'll speculatively do it - it'll be quick to do and if you want to use it, feel free. :-)
  • Hmm. sorry for all the typoes - that'll teach me for posting responses to things when it's late! Anyhow, good luck :)
  • If you're gonna talk about the GIL, can you explain what stackless python is and what it means for concurrent programming?
  • I can mention it briefly, but won't go into details given stackless
    still has a GIL
  • Ben
    I'd be interested in something like Kamaelia... Maybe a map reduce kind of application built with kamaelia that can be distributed over the network?
  • I won't be there but can hopefully catch a video or see the slide deck.. sounds very interesting and this is an area I am very into. Id especially be interested in hearing about changes to the threading API. I didn't realize anything changed in 2.6/3.0.

    I am very intrigued with the new multiprocessing and will be using it immediately on a project as soon as the final release of 2.6 goes out :)
  • No problem, I'll probably post everything here when it all comes out.

    As for the threading changes in 2.6/3.0 - it's really just pep8-ifying some of the names, and moving some of the other internals to properties. See: http://bugs.python.org/issue3352

    I'd be interested to hear how you're using the package, some of what you and I do at our day jobs overlap, so to speak :)
  • Doug Napoleone
    One idea for the example program:

    Concurrent test execution.

    You have a bank of tests you need to run before committing, and a 4 ore machine.
    Scale that up to something which can be run across multiple machines for a qa cycle.

    Just because there are packages out there which already do this (nose+buildbot, etc), does not mean that is is not a good example problem.

    It sounds like it meets your needs and pulls from your strengths.
  • Oddly enough, I was designing something like this today. Also, Jason P, of nose fame has been working on a multiprocessing based nose extention to do this:

    http://code.google.com/p/python-nose/source/bro...
    http://code.google.com/p/python-nose/issues/det...

    I could further build on that - it's not a bad idea. I've also considered doing a web-load-test application, or a data-crunching app ala the wide finder project (http://www.tbray.org/ongoing/When/200x/2007/09/...)
blog comments powered by Disqus