PyCon 2009 Talks accepted.

Well then!

Last night I got two emails - both of my talks I proposed for PyCon 2009 were accepted, here’s the title and abstract from both:

  • Introduction to Multiprocessing in Python
    • This talk will cover the new multiprocessing package included with Python 2.6 (and 3.0) focusing on design, benefits, practical usage, application construction, gotchas and how to use it to build multi-core and distributed applications.
  • Concurrency and Distributed Computing with Python Today
    • This talk will cover the recent changes to Python 2.6, including a brief introduction to the threading module and multiprocessing inclusion and changes but will primarily focus on the concurrent and distributed ecosystem for Python today.

The first talk is relatively straightforward - I am going to do an introduction to the mp package and all it’s bells and whistles. The second one is more of a “where are we today” with concurrency/distributed systems. I am going to probably have to trim down from my initial outline for time constraints, but my hope is to be able to cover things like Kamaelia, Dramatis, and others as well.

I’ve got most of the first talk done - and I may end up asking the local python users group in Boston if they’re interested in me doing dry runs for both prior to PyCon. Here’s to hoping I don’t let anyone down.

Python 3.0: “What’s the Point”

I saw this post yesterday, and again today it came up. The last comment from the author is this:

Bob — I’ve come to the exact same conclusion. Python 3.0 sounds like it’s great for the Python implementors themselves, and for the language lawyers. It should have been labeled a “Technology Preview”: something to take a look at as an indication of future directions, and for people who maintain libraries to work on compatibility with.

It’s weird how most of the people commenting don’t seem to get that there are factors other than academic ones involved. There is a marketplace for languages, and those that do less well at attracting developers tend to dwindle. Look at LISP — very much ahead of its time, but in focusing so much on the purity of the language, and refusing to make compromises that would make its syntax more approachable or readable, they doomed it to be marginalized.

Python 3 is actually probably going to tip me more toward Ruby for future development. Ruby has its issues, but at least it’s not forked, so I won’t have to throw out any books, and once the minor 1.9 compatibility roadbump is past I won’t have to worry about which library is compatible with which flavor of the language.

It wouldn’t be bad if others with more time would respond to this. As it is, I’ve read so many negative things about py3 and multiprocessing in the last few days I’ve burned out my ability to be reasonable.

It’s on hacker news too.

Here’s a positive post :)

Oh man, I <3 james bennett - see his response/post here

Python 3.0, some multiprocessing info, administrative notes.

So, first off - unless you’ve lived under a rock for the last 24 hours, you should know Python 3000 final is hot off the bit presses. This marks a huge milestone for the language, and major props are deserved to all of the python-core people who have spent so much time working on it.

Python 3000 marks an interesting point in the evolution of the language - we all know it is meant to clean up some of the warts of the python 2.x series. It’s designed and implemented knowing full well it breaks backwards compatibility - but I would argue it doesn’t fall in the black and white camp of revolution vs. evolution. In my mind, Python 3 actually falls right in the middle. Yes, it is revolutionary in the aspect that it breaks compatibility with 2, but it is not so significant a series of breakages that it falls outside of the evolutionary camp.

For example, changes to fundamentals - like whitespace, dropping tuples, switching to pure functional programming - that in my mind counts as revolutionary. Python 3 counts as more of an evolutionary jump - the changes are meant to clean things up - removing the prehensile tail and single eyebrow, so to speak.

It’s still a jump though - that why the core team has been repeating the mantra “use it for porting/prototyping but not day to day production use”. I can’t stress this enough - Python 3 is slower that 2.x, and library maintainers are going to lag significantly behind. Sure - download it, experiment with it, and keep it in your mind, but don’t deploy with it - right now.

Actual widespread adoption of it is going to happen after 3.1/3.2 at very least, and the 2.6 and so on releases will continue to thrive. Adoption of 3 will be a long time in coming, and will ultimately be driven by the OS maintainers. I can easily see, over the next few years, python2 and python3 co-existing on operating systems and then one day, python2 just goes away.

All in all, it’s a great series of changes, and a lot of people put a lot of work and time into it. Python development - core development is pragmatic, reasoned and always thoughtful about the changes that go into the language and libraries. The amount of time that is required to be this way is amazing.

Moving on.

So, I spent a few days in november at PyWorks in Atlanta - I got to do my talk on threading/multiprocessing (slides here). I think it went OK, but provided my multiprocessing talk is accepted for PyCon, I know what I will do differently.

I got generally positive feedback on the talk, so that was a bonus.

Also, recently I got a chance to fix a handful of multiprocessing doc issues - I expanded some examples, fixed some other doc errors/etc. Doc changes go live quickly, so you can see them shortly after I update them.

I am always looking for:

  • Bug reports on multiprocessing
  • Usability issues
  • Examples that would help make things easier to understand
  • Doc suggestions

Everything is fair game. I openly admit that not everything is as clear as it should be, and the package has some rough edges I want to whittle down, but finding random statements on popular sites like:

“except that the multiprocessing library is half-baked. it’s been the bane of my existence recently.”

and not having bug reports, emails or any other information to help me make it better drives me up a wall.

Since 2.6 was released, I haven’t had enough time to give it a lot of the TLC it needs - real work, and being a dad trump my open source time. Even my other projects are in the cooler until a less busy time is achieved.

So please - send me you suggestions, send me your criticisms, hell, send me hate mail if you really feel the need to vent, but don’t do nothing. Right now, I’m the primary maintainer, but Christian Heimes has also been doing a lot of heavy lifting, as well as others. It’s really a team effort (especially as Richard, the original author has been MIA for some time).

Lastly, I fixed my hosting issues - apparently my host disabled my account without telling me that it was being disabled. Hooray.

Hooray, Actors!

A coworker just dropped an ancient artifact off for me to read:

IMG_0291.JPG

Actors, concurrency and Kamaelia

Recently, I made an offhand comment here about:

I’ve actually started thinking about/sketching an actor model build on top of MP, using concepts from actors/monitors and things in the ecosystem today

The ensuing comments and discussion were pretty good - but last night Michael Sparks (of Kamaelia) posted a darned nice comment:

Are you aware that a complete mini-axon using generators is tiny and the rest is optimisations and extra stuff that you find useful in real world systems? By tiny, I mean this small:
* http://www.kamaelia.org/MiniAxonFull

A mini-axon using processes would be equally lightweight (shorter probably) and pretty awesome.

Also, it’s easy to confuse the two halves of Kamaelia. If you think of Kamaelia as just an actor-type implementation, then it’s actually more an actor-like implementation, with STM & an internal SOA system of just over 2000 lines (which is how big Axon actually is, excluding comments & docs), with 80,000 lines of examples…

However, personally I view it as a mechanism for building components which happen to be best used in a concurrent fashion. ie rather than viewing it as “a mechanism for using concurrency”, I view it as “OK, assume we have concurrency, how can we use this to assist in building and maintaining systems”. Axon also gives you the tools for taking these concurrent systems, and interfacing between concurrent systems and standard code. (http://www.kamaelia.org/AxonHandle)

As a result, I view Axon as a library which provides you with the tools wrapping up idioms useful for building collections of components which be a framework.

Anyway, potato/potato, tomato/tomato - if you like, you like, if you don’t, you don’t.

I’d love to replace our existing process based stuff btw with a multiprocessing based version though. If I was going to go down this route, I’d follow our mini axon tutorial to do so, largely becauseit’s essentially the starting point I took with the multiprocess stuff recently and it worked out pretty well.

Beyond this basic stuff though, I’ve noted that people generally start talking about co-ordination languages and building up pattern repositories. The interesting intersection between these two which you get if you call things components rather than actors is it becomes natural to create components called a chassis. These chassis often instantiate directly in concrete usable form concepts that you’d normally refer to as a pattern - Pipeline, Graphline, Carousel, Backplane, Seq, TPipe, etc.

On a random note, you may want to check out MASCOT “Modular Approach to Software Construction, Operation and Test”. I heard about it late last year, and it appears to have the same sort of architecture as Kamaelia. Interestingly (to me) it makes the same key decision - when you send a message outside your component, you don’t know who is going to receive it. This then enables (and requires) a higher level system for connecting components together. The upshot is highly reusable components. This doesn’t entirely surprise me - my ethos came from recognising that asynchronous hardware systems & network systems look strikingly similar… (cf http://www.slideshare.net/kamaelian/sociable-so…)

Anyway, reference for MASCOT: http://async.org.uk/Hugo.Simpson/ - skip down to the end of the page for this PDF: http://async.org.uk/Hugo.Simpson/MASCOT-3.1-Man… I was really pleased to be pointed at MASCOT, largely because it showed a large number of other domains where the same basic model has been used for well over 30 years… Just with non-existent exposure, and slightly different metaphors. Though we, like it, also have mechanisms for automatically visualising systems, with a 1:1 correspondence. Beyond that this also gives us a model that matches Edward Lee’s “The Problem with threads” - we’d released running code long before that paper was published :-)

Anyway, I’m glad that you’re looking at what we’ve done. If you use it, that’d be great, and I’ll happily merge anything you’d like to have a life. (the only comment I’d make there is metaphors and accessibility count - this is surely the point of python? :-) If you don’t take what we use etc but it helps you solidify your thoughts to “No, I don’t want that, I want this”, then likewise, I’m equally glad. If you do that, I’d love to know what you do try, since I like to merge best practice concurrency ideas into Axon :-)

I’d *REALLY* suggest looking at MASCOT though. Really made my Christmas last year when I was pointed at it.

We’re currently having lots of fun using concurrency, primarily by allowing it to make our lives easier, and forgettable about :) It’d be nice to see something similar on top of multiprocessing (which we’ll do if you don’t, but it’d be great if you did - but I’d understand if your view was that you prefer a pure actor (sender knows receiver) model.

Originally posted as a comment by Michael Sparks on jessenoller.com comments using Disqus.

I wanted to pull that comment out and showcase what Michael has to say, and in some way, respond. First, yes - I am looking at Kamaelia (from here on out, I’m going to call it “Kam” - I keep transposing the ae). I actually ran through the mini-axon tutorial, and when I have time, I’m trying to tease apart the internals to better understand it.

Fundamentally, I agree with you (Michael) about the aspects of making concurrency easier (and safer). Right now, I think Kam is a pretty darned good start - for a framework *grin*.

When I made the comment I made, I didn’t think it would get the response it got. There has been a dog-pile of discussions about concurrency best practices/etc and in fact, there’s a discussion still going on on the python-list about concurrency stuff going on right now.

Personally, I only work on the concurrency stuff part-part time - this includes my minor work on Python-Core. My day job is a test engineer - while I am building highly concurrent (and distributed) tests and I use multiprocessing and threading daily, it’s not my full time pursuit. I am passionate about it, and I am passionate about improving python as a language, and library - and if I can do it as a day job and open source it, or get company time to do it, by golly I will.

When I said “I want to build an actor model” - I was not necessarily talking about doing an implementation for python-core. I’m a big believer in learning-through-implementation - so when I said “build x”, not only did I mean “build something for the world” - I also meant “build something for my own benefit” so I can deep-dive into the concepts, problems etc.

This is why I am adverse to jumping in and simply “using” a framework - not because I don’t think it does something exceedingly well - trust me, I am an opportunistic developer - if I can find a library that does what I need *right now* - I’ll use it.

That being said - I am exploring Kamaelia, and yes hopefully I can steal some time to actually do an implementation of the process-based stuff with multiprocessing. I want to explore everything in the ecosystem today - my discussions with Adam Olsen around this stuff (and around python-safethread) and others has made me really want to explore solutions that help everyone, and take the best ideas and concepts and rolls it into something worthy of Python core.

As I have said before - I believe there is room within Python as a language, and CPython as an implementation of that language - for a virtual buffet of helpful libraries for concurrency and distributed systems. Right now, we have threading, async* and multiprocessing. There is plenty of room to grow. Maybe one day I can steal time to grab more of the concepts from java.util.concurrent and propose them via a PEP. Heck - maybe we can work as a group to propose an actor/monitor implementation for Python-Core.

So - personally, and by way of responding to Michael in a more concrete way: I’m, personally looking at anything I can to learn more about implementations and strategies. If something were to come out of it, and I felt strongly enough to propose inclusion in core, I would write and post a PEP - and not run in blindly.

I’ve got more than enough stuff to work on in addition to the day job and being a Dad, and yard work. Oh and the day job, which doesn’t involve nearly as much distributed-and-concurrent systems in Python as I’d like :).

Heck - thinking about it we’d need a good messaging implementation too. I’ll put that on the pile too.

Quick Rant (slightly off topic):

Also, can we stop talking about the damned GIL? Yes, you need locks, No, you probably don’t care about the GIL. Stop yammering about how “broken” CPython is because of it- CPython is an implementation, not the final one and not the only one. If the GIL really gets you excited, either drop to a C module, use multiprocessing or something else. The GIL is here to stay for some time - either propose a PEP (and a Patch) that doesn’t break CPython or hush. Enough bike-shedding - discussion is great, especially when something comes out of it, but constantly berating/lamenting things is just a bike shed. The shed is purple, now move on. Purple!

Required Reading ( in addition to Michael’s links):

Guido answers your questions…

Via GVR’s blog he’s put answers up on the ask-a-google-engineer page for the top 20 questions.

I’m seriously laughing here:

  • Q: “Why google choose python as the main programming?”
  • A: Incorrect this is.

Oh, and yay multiprocessing (see: this question) but man is there a lot of work to do. I’ve actually started thinking about/sketching an actor model build on top of MP, using concepts from actors/monitors and things in the ecosystem today

October Issue of Python Magazine is live.

83.jpgAnother information rich issue of Python Magazine is live! This time, it’s the October (stunning, I know) issue. The cover story is on “Versioning your database with sqlalchemy-migrate” - and of course, there’s a short article from me on “SSH Programming with Paramiko”. You can see the run down for the article here.

A lot of good people put a lot of work into the magazine - Doug Hellmann doesn’t let any of us slack off, and keeps us on target. If you don’t already have a subscription - you should get one.

Additionally, if you like writing - or even if you’re new to writing, if you have and idea for an article please submit it!

Also remember, the people behind Python Magazine are having a conference in november - PyWorks is happening in Atlanta, GA on November 12-14th. I’ll be doing a talk there on threading/multiprocessing.

Updated pyjavaproperties

I just pushed a minor update for PyJavaProperties out - this just adds a simple list as an index of the keys as they’re parsed, and appends new keys on the internal index. Next, I want to keep original comments and whitespace.

Nose-testconfig version .5 uploaded.

Fixes a minor issue with python config file parsing.

Next up, hierarchical YAML files!

Backport of 2.6 Multiprocessing to 2.4/2.5

Thanks to the work of Christian Heimes and Skip Montanaro with a supporting role by me, the 2.6 version of Multiprocessing (pyprocessing) has been forked/adjusted for compatibility with Python 2.4/2.5.

You can see the pypi package here, and the google code project here.

We chose to do the back port for a variety of reasons - having the new API for the package makes it easier to jump into 2.6 if you’re like me: a heavy pyprocessing user, not to mention that during the migration into python-core, a lot of bugs were fixed.

Check it out, and feel free to file bugs. We’re planning on back porting bug fixes from python-core to the project, and applicable bug fixes from the google code project into python-core.