A gentle overview of Kamaelia or "it's axon, stupid"

by jesse in , , ,


Note:This is the first post in what I hope will be a series leading up to my concurrency/distributed systems talk at PyCon. I'm steadily working through experimenting with and learning the various frameworks/libraries in the python ecosystem. I reserve the right (and probably will) to revise these entries based on feedback from people (mainly the author(s) of said tool(s)). I will also add additional bits and pieces as I learn and explore more. Code and examples will be checked into my pycon 2009 bitbucket site here/Note

For awhile now, I've been meaning to dig into Kamaelia but was largely put off by what I tend to call the "twisted effect". What this means is that when I go looking for libraries and small components, I go looking for a library - not a "solution". I also worry about the "once you go in, you must follow this paradigm" effect. I'm not going to say that these feeling continue to be founded, or are completely rational - after all, I am digging into it, yes? It's the thought that once you adopt the "one true way of doing things" you're trapped in that solution/framework "forever" - ironically, I love Django for it's "conceptual integrity" and full-stack approach. No, I don't understand me either.

Also, as time has progressed, I have found that part of me yearns for a clean and simple to use "framework" (note the small "f") that would help build out a large system without introducing a lot of complexity. Ideally, that framework would allow me to swap components in and out - think of web frameworks like django and turbogears - in this case, instead of using stock localized IPC, I might want to swap in a simple messaging protocol (Pyro, XMPP, etc).

It's also a matter of marketing and approachability - things have improved on both websites mind you. Looking at Kamaelia's website though, I don't find it approachable, as it's not immediately clear what the core idea is, or what the difference between Axon (the core) and Kamaelia (the project) is. For example, if I had one critique, I would say that Axon should become it's own "project"/library in and of itself, and almost have it's own website. It would be like ripping Twisted's reactor out and making it a completely separate library.

Kamaelia, like Twisted, is based on a "simple" core - in this case, it's the Axon library which has some very simple goals and paradigms it seeks to fulfill. To quote the Axon page:

Axon is a component concurrency framework. With it you can create software "components" that can run concurrently with each other. Components have "inboxes" and "outboxes" through with they communicate with other components.

A component may send a message to one of its outboxes. If a linkage has been created from that outbox to another component's inbox; then that message will arrive in the inbox of the other component. In this way, components can send and receive data - allowing you to create systems by linking many components together.

Each component is a microprocess - rather like a thread of execution. A scheduler takes care of making sure all microprocesses (and therefore all components) get regularly executed. It also looks after putting microprocesses to sleep (when they ask to be) and waking them up (for example, when something arrives in one of their inboxes).

This by itself is the shining gem of the Kamaelia ecosystem - everything else is applications or additional utilities built on this simple core. This is where the website confusion comes in - where does "solutions built with axon (e.g. kamaelia)" end and Axon begin? The core design (of Axon) is very simple though: build a component which communicates via message passing.

Message passing is a relatively simple concept. Component A generates some work, and then sends it to Component (Not A). Messages are handled by the receiver and results can be passed (via a message) to someone else.

Very, very simple. You can add on little factoids about the fact that messages sent and received are handled in asynchronous fashion, messages can be sent locally - or across a wide network, etc - but largely those are component implementation details.

Which gets us back to Axon.

Since I'm interested in the core - and not video transcoders - I hit up the MiniAxon tutorial here and worked through it - even then, I don't really think it did Axon complete justice. I then jumped into the "How to write new components" article by Michael.

The second tutorial, in my humble opinion, should be the first article users are directed to, while it has some polishing issues, I found it to really explain what the fruit was going on - and what Axon is.

Reading through both of these, you begin to realize that Axon is built on the core concept of Python generators and yielding control to a scheduler. For example:

def sender():
   sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
   sock.bind((ANY,SENDERPORT))
   sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_TTL, 255)
   while 1:
      time.sleep(0.5)
      sock.sendto("Hello World", (MCAST_ADDR,MCAST_PORT) );
      yield 1

If you're familiar with python generators, you know that what this function does is send a message to a socket, and then hand control back to the main program. It will do so forever until the controller exits.

This paradigm is key to Axon: components send and receive messages via mailboxes (by far, one of the best descriptions/abstractions I've seen for this) - the components do the work sent/generated and then put it in the right outbox, and then yield control, courtesy of Enhanced Generators (see PEP 342)

Yes, coroutines/greenlets/tasklets - stop bothering me.

In the tutorial I linked above, Michael takes the very simple network script and ports it to Axon. Here's my simple experiment that drops the networking code and cuts to the mailbox system. In this case, all I want to do is send and receive the lyrics to the meow mix commercial, forever:

import Axon

LYRICS="I want chicken I want liver Meow Mix Meow Mix Please Deliver."

class Producer(Axon.Component.component):
    def main(self):
        while 1:
            self.send(LYRICS, "outbox")
            yield 1

class Sender(Axon.Component.component):
    def __init__(self):
        self.__super.__init__()

    def main(self):
        while 1:
            if self.dataReady("inbox"):
                message = self.recv()
                self.send(message, "outbox")
            yield 1

class Receiver(Axon.Component.component):
    def __init__(self):
        self.__super.__init__()

    def main(self):
        while 1:
            message = self.recv()
            print message
            yield 1

def tests(): 
    from Axon.Scheduler import scheduler 

    class testComponent(Axon.Component.component): 
        def main(self): 
            producer= Producer()
            sender = Sender()
            receiver = Receiver() 

            self.link((producer, "outbox"), (sender, "inbox"))
            self.link((sender, "outbox"), (receiver, "inbox"))
            self.addChildren(producer, sender, receiver)
            yield Axon.Ipc.newComponent(*(self.children)) 
            while 1: 
                self.pause() 
                yield 1

    harness = testComponent() 
    harness.activate() 
    scheduler.run.runThreads(slowmo=0.1) 

if __name__=="__main__": 
    tests()

Now, this uses knowledge from both tutorials, and the Axon.Component documentation. The component documentation can be hard to find, and not so easy to navigate too.

If we break one of the classes/components down and look at the "magic" provided by the component subclass, it gets clearer - in this case, I've "added" back in the methods from the component superclass, minus the doc strings:

class Sender(Axon.Component.component):
    # First, subclass the Axon Component class, this provides us with the
    # basic inbox/outbox static members that look like this:
    Inboxes = { "inbox"   : "Send the FOO objects to here",
                "control" : "NOT USED",
              }
    Outboxes = { "outbox" : "Emits BAA objects from here",
                 "signal" : "NOT USED",
               }

    def __init__(self):
        self.__super.__init__()

   def recv(self, boxname="inbox"):
      # returns the first piece of data in the requested inbox.

      return self.inboxes[boxname].pop(0)

   def send(self, message, boxname="outbox"):
      # appends message to the requested outbox.

      self.outboxes[boxname].append(message)

   def dataReady(self,boxname="inbox"):
      # Returns true if data is available in the requested inbox.

      return self.inboxes[boxname].local_len()

    def main(self):
        while 1:
            if self.dataReady("inbox"):
                message = self.recv()
                self.send(message, "outbox")
            yield 1

I think this makes it abundantly clear what's happening with the method calls on this class. Now, there's the additional magic of the new tests method - in which we defined a new component, which was actually a component containing and linking the pipelines (connections between mailboxes) between the other components.

Now, the Axon.Ipc.* docs aren't the most helpful - in our case, we called:

    self.link((producer, "outbox"), (sender, "inbox"))
    self.link((sender, "outbox"), (receiver, "inbox"))
    self.addChildren(producer, sender, receiver)
    yield Axon.Ipc.newComponent(*(self.children)) 

Within the main method of a component. We need to look at the link method on the superclass:

   def link(self, source,sink,*optionalargs, **kwoptionalargs):
      """\
      Creates a linkage from one inbox/outbox to another.

      -- source  - a tuple (component, boxname) of where the link should start  
                   from
      -- sink    - a tuple (component, boxname) of where the link should go to

      Other optional arguments:

      - passthrough=0  - (the default) link goes from an outbox to an inbox
      - passthrough=1  - the link goes from an inbox to another inbox
      - passthrough=2  - the link goes from an outbox to another outbox

      See Axon.Postoffice.postoffice.link() for more information.
      """
      return self.postoffice.link(source, sink, *optionalargs, \
                                  **kwoptionalargs)

This leads us to the Postoffice class which actually constructs and tracks the links between the components. Here there be dragons.

So, addChildren just registers all of the passed in component instances as children of the newly constructed components (components, all the way down), and then we yield ourself - if you added a 'print harness' you'd see:

Component __main__.testComponent_5 [ inboxes : {'control': [], 'inbox': []}
outboxes : {'outbox': <>, 'signal': <>}

This means we're getting a component back, and then calling .activate on it - .activate is actually a method on the Axon.Microprocess.microprocess class. In our case, it (it being activate) is simply registering our test component (which contains all of the children) with the default scheduler.

At which point, we call scheduler.run.runThreads jazz hands.

I dived into some of the internals here, I ended up having to supplement the documentation with pouring through the code - but I personally think it helps clear things up to remove some of the magic and show what is actually occurring. Most of the time, it seems you simply won't care - and instead you'd just make and register your happy component and be on your way.

To somewhat summarize what we're lookin at - a component that subclasses the default Axon.Component uses generators to yield control back and forth, passing messages/work to and from each other via the very clever mailbox/postoffice metaphor.

Now, the interesting thing, once you start digging through things is that your component isn't a really a Thread - if you wanted to make sure each component was in its own thread, you might instead subclass Axon.ThreadedComponent.

Subclassing this new class looks mighty close to what we did before but instead Uses threads and queues for the message passing. Instead of yield, you just run, and the recv/send methods are backed by queues. Ahh, delicious non judgmental queues.

In any case, Kamaelia - via Axon, is a very nice abstraction on top of a very simple concept - message passing for concurrency. The fact that you can quickly build up a series of components which pass work back and forth via some sort of communications system and not have to worry about the underlying nuances/organization is quite nice.

One of the things Michael Sparks and I have talked about is adding some level of multiprocessing support for Kamaelia - this would actually be insanely easy if I used Axon.ThreadedComponent as the template, but instead used multiprocessing.Queue and multiprocessing.Process as the back end.

Kamaelia itself, is really a series of example components/applications which build on a core (Axon), but you do not need Kamaelia to use Axon effectively. In fact, just on a whim, I decided to whip up a dirty http load tool in Kamaelia. You can see it here.

It's really a hack job - all I did was build off the meowmix demo, swapped in the threaded component and hacked it around a bit. One thing I'd like to know is how to pass in a dynamic number of clients so that I could create the outboxes dynamically in the producer - there wasn't anything clear in the docs to allow me to do this. Also, it doesn't shut down.

I'm going to keep hacking around with Axon, it's pretty neat. Interesting things I'd like to poke around in:

  • Replace the underlying IPC mechanism with multiprocessing.Pipe/Pyro/posix_ipc
  • hack on the mp version of the component backend
  • get a multi-system script running and communicating workloads across the LAN