Stackless: You got your coroutines in my subroutines.

by jesse in , , ,


Note:This is another post in what I hope will be a series leading up to my concurrency/distributed systems talk at PyCon. I'm steadily working through experimenting with and learning the various frameworks/libraries in the python ecosystem. I reserve the right (and probably will) to revise these entries based on feedback from people (mainly the author(s) of said tool(s)). I will also add additional bits and pieces as I learn and explore more./Note

Stackless python - here's another big one on the pile - is much more than a library, or a framework which runs on CPython - Stackless is actually a modified version of the CPython interpreter. It's much more than just a C-extension. Stackless is in use by various people and companies - most notably, it's in use by CCP Games, makers of Eve Online (see this pycon presentation). In fact, CCP Games is a large part of why Stackless is still around today.

I say that intentionally - quoting the readme in the Stackless/ directory of the distribution(here):

In 2003, the fabulous PyPy porject was started, which is still performing very well. I have implemented Stackless for PyPy (with lots of support from Armin), and it is so incredibly much nicer. No more fiddling with existing calling conventions, no compromizes, everything that needs to be stackless also is.

Unfortuantely, PyPy is still not fast and complete enough to take over. This means, my users are still whining for an update all the time CPython gets an update. And maintaining this code gets more and more a nightmare for me, since I have the nice PyPy version, and I hate hacking this clumsy C code again and again.

The original author, Christian Tismer largely moved onto PyPy, which is still largely in it infancy (although I read through bits of the code base frequently, it's pretty), and further development has largely been stalled minus the improvements Richard M. Tew (CCP Games) and others have done. There's still life in it.

Fundamentally, Stackless modifies the interpreter internals a bit to modify the way that the C call stack is manipulated/used as well as to add other other nice bits Stackless offers (call stack). Stackless simply doesn't use the C call stack - all told, each microthread only has a few kilobytes of overhead, which is awesome.

It adds something called microthreads and does other patching to python-core. Normal OS/Posix threads require a fair amount of resources to create and run - in the case of Python, each thread has to get its own stack, this costs memory. With Stackless' microthread support - you get "threads", but threads which cost a significantly less, and potentially execute faster due to context switching improvements (no need to go from user->kernel->user and so on).

Point of Order: Before I continue, I want to clear up a common misconception I've heard - Stackless, does not in any way, remove the Global Interpreter Lock. No sir. It's still there. Lurking. Waiting to steal your candy. Also, it still has a stack, so it's not truly "stackless".

So, microthreads are smaller and require less OS hand holding for context switching, and ultimately can (and are) scheduled by the interpreter, rather than the operating system.

install note for os/x users: You need to pass the "--enable-stacklessfewerregisters" to configure, otherwise, make pukes on you.

Stackless is a basic implementation of these - for a simple resource example usage example, I wrote a simple script which spawns 2,000 threads (sorry windows) and 2,000 tasklets. I watched the memory usage of both:

import time
import threading
def func():
    time.sleep(120)

threads = [threading.Thread(target=func) for i in range(2000)]
for i in threads:
    i.start()
for i in threads:
    i.join()

For the threaded script - the resident size was 42M and the virtual size was 1037. Versus the stackless version:

import time
import stackless

def func():
    for i in range(120):
        time.sleep(1)
        stackless.schedule()

for i in range(2000):
    stackless.tasklet(func)()

stackless.run()

Stackless had a resident size of 3416K and a virtual size of 22M - virtually microscopic versus the heavier thread version. Obviously, they are not line for line comparisons - the Stackless version, like other cooperative multitasking systems requires that each tasklet be a good citizen, and not block execution forever, instead rescheduling itself or otherwise yielding to allow for others to run. If a tasklet blocks on a socket, everyone blocks on that tasklet.

Someone asked me to track the linear growth of the threaded numbers vs. the tasklet numbers. Since I'm a sucker, I thought I'd take him up on it (OS/X 10.5, 4GB of ram, Core 2 Duo):

Threads:

Num Threads Resident Size Virtual Size
2 3412K 23M
200 7336K 123M
2,000 42M 1037M

Tasklets:

Num Tasklets Resident Size Virtual Size
2 3128K 21M
200 3164K 21M
2,000 3408K 22M
20,000 5920K 24M
100,000 17M 34M

One note - the Stackless numbers should be low, but not this low (from my understanding, and review from others), anyone have any ideas?

There's the numbers - lots of threads is going to consume lots of ram. With stackless, a given tasklet is only a few kilobytes in average size and therefore the memory footprint is small when you start raising the count. Additionally, note the two counts at the bottom of the tasklets table; you can't spawn that many threads (depending on your OS and configuration) and even if you could, the memory footprint would be costly.

Now, in the age of cheap-ass-ram, where you can trick out a desktop or server with 16GB sticks, people might argue "so what" - but on machines where memory is constrained, such as smaller notebooks, embedded devices, or game consoles - this is a critical thing to take into consideration.

If you look at the stackless code, there is another big thing to realize; Stackless like other frameworks or systems which use an scheduler built into the interpreter gives you the benefit/task of scheduling when your tasklets/components/etc execute. This gives you more control, but more responsibility. Stackless offers both cooperative and preemptive scheduling, however the preemptive scheduling doesn't feel right. more on scheduling here

So, we've determined that stackless tasklets are smaller, right? Pretty simple.

If you've read the other things I've written on Kamaelia/Twisted/etc, you'll recognize the concepts within Stackless pretty quickly - a tasklet is a component, a thread of work and tasklets intercommunicate via channels. For example, here's a little example of two tasklets communicating:

import stackless

def chicken(channel):
    channel.send('cluck')

def egg(channel):
    print channel.receive()

channel = stackless.channel()
stackless.tasklet(chicken)(channel)
stackless.tasklet(egg)(channel)
stackless.run()

Pretty easy, and a tiny amount of code. The concept of tasklets/microthreads isn't a new one - in fact, it's how Erlang gets it's groove on - Erlang doesn't use native OS threads, instead, it uses microthreads scheduled by the Erlang compiler. However, they are not directly comparable. Stackless isn't running across cores - Erlang does, stackless, due to the GIL, has to obey the same rules as the rest of python-core. For more on "erlang v. stackless", see this.

Oh, and you can share normal object via the channel too:

import stackless

def yes(channel):
    x = channel.receive()
    x.append('yes')
    channel.send(x)

def no(channel):
    x = channel.receive()
    x.append('no')
    channel.send(x)

channel = stackless.channel()
stackless.tasklet(yes)(channel)
stackless.tasklet(no)(channel)
channel.send([])
stackless.run()
print channel.receive()

Moving on, Stackless offers something else - the ability to pickle tasklets. This means you can pickle up a tasklet and send it over the wire to another machine and then unpickle it and continue running it - channels get pickled too. Locally, this means to can save it to disk, and then resume state easily.

You could use this to generate a tasklet which listened on a port for data on the local machine, and passed the data off the wire to the channel - when you pickled the channel or the tasklets with that channel in scope, and sent it over the wire, they would pick up listening on the same port number on the remote machine. You loose current sessions, yes, but you could also detect active sessions and handle those gracefully.

This is nice for say, a component you wanted to be able to easily send to other machines to help load balancing. In theory, you could auto-detect new servers being added to a cluster, and when that server came up into a "ready" state, send it the daemon it should handle - and the tasklets would pick up where they left off (minus the sessions).

Otherwise, pickling channels and tasklets could be used for a few things - you have to think of them in terms of coroutines (here) - you should be able to suspend processing state and then simply resume where you left off. If you've got python brains - picke-able generators. You could put them in a database; but the use of that escapes me at the moment.

Oh - and pickled tasklets and channels can be shared amongst different architectures too, as long as that architecture is running the same version of Stackless, and supports Stackless.

To continue on - if you wanted to add threads into the mix as well - Stackless tasklets can be run within Python threads, however those tasklets are local to that thread, and each thread gets it's own scheduler. Your main application thread has it's own scheduler, and so on.

Stackless' tasklet/channel system is quite nice, however note that I'm not saying Stackless is the only way into this magical world - it's not, especially with a plethora of coroutine/greenlet/etc packages for python today, and the continued work towards making generators more awesome. I'm just showing what Stackless can/could do.

The primitives within Stackless are nice - frankly, I'd like a light weight green thread implementation in python core on which we could build a nice Actor library, as well as support the lower memory footprint/etc. However, in order to use these primitives within Stackless - you'd find yourself building your own abstraction layer/framework (for example, concurrence) to really get a lot of mileage out of it. This is why people run twisted on top of it, CCP Games has the uthread library (which you can see here) and so on.

The cost of a deployment of Stackless can not be underestimated though - it's got some magic assembler code within it, which isn't the most portable of goods (versus OSes, compiler versions, compilers, etc). Some platforms simply aren't supported due to this. Not to mention, it's an entirely new interpreter, which has a cost much higher than that of an extension module.

A few people who I've been talking with asked me the simple question - "Why hasn't any of this been pushed into python-core". Well, in short - it was never really proposed (by Christian), and the changes within Stackless - the last serious discussion was from 2007 (see this).

With Stackless, it's difficult - I think there is a perceived complexity about the code and then there is real complexity. I suspect both of these are high in the case of Stackless due to the nature of the problem it is trying to solve; namely bolting a feature like this onto an interpreter not meant for it. I think that due to this, and due to Christian and others moving onto the greener pastures of PyPy - inclusion into core simply won't happen.

Resources:


Circuits: event driven components.

by jesse in , , ,


Next up in the GBLOSTR (great big list of stuff to review) is the Circuits library by James Mills (here and here) I'm familiar only with James Mills' posts on python-list, but more recently, I know he's been working on getting some level of multiprocessing into circuits - circuits was already on my research list, but I bumped it up on the queue because we started chatting. Let me state this: these are slightly cleaned up versions of my notes as I am learning these modules - some of them have higher learning curves for me than others.

Circuits is, well - an event based "framework" (again, note the small f) based around the concept of Components (big C!) consuming/reacting and in turn generating events - all asynchronously.

James' goals seem pretty simple - build something with no external dependencies, that's compact (I should say, the core.py is < 500 lines) and that makes it easy to build scalable messaging based (event based) systems. Did he succeed - don't know! Let's dig in.

Note:Given his site is/was down at the time of this writing, I simply cloned the hg repo (here) and used that for all the code and documentation, which is why I am going to sparse on direct links to the documentation. Also note, that pulling from tip introduces dependencies on python 2.6.

Starting with the quickstart - always a good place to jump to, James outlines the very basics - as in, the very, very basics via a simple code example, which I have paraphrased, renamed the objects of and... you get the picture:

from circuits import listener, Event, Component, Manager

class AnEvent(Event):
    "AnEvent Event"

class TheComponent(Component):

    @listener("anevent")
    def on_anevent(self):
        print "hello, I got an event"

def main():
    manager = Manager()

    thecomponent = TheComponent()

    manager += thecomponent

    for i in range(10):
        manager.push(AnEvent(), "anevent")

    while True:
        try:
            manager.flush()
        except KeyboardInterrupt:
            break

if __name__ == "__main__":
    main()

And of course, if you run this - you see "hello, I got an event" print 10 times. Simple! But here there be magic, so let's break it down. First, the events.

Here, we subclass and generate a new event:

class AnEvent(Event):
    "AnEvent Event"

If you look in core.py, you'll find that Event is a container - there's some magic here at first glance - but really all this does (see __new__) is construct a new object of the passed in class, and chain any arguments into attributes. The rest is just accessing and comparison - for example:

>>> from circuits import Event          
>>> class x(Event):
... 	"an x event"
... 
>>> y = x(1,2,3,4,arg_one='foo',arg_two='bar')
>>> print y

>>> 

This means, that you can pass in any number of positional arguments and keyword arguments, and they'll be packed into your event. More on this later.

Moving on, the next thing we define is TheComponent:

class TheComponent(Component):

    @listener("anevent")
    def on_anevent(self):
        print "hello, I got an event"

The parent class of this object is a little bit more magical. James doesn't seem shy of the metaprogramming:

class Component(BaseComponent):

    __metaclass__ = HandlersType

Nuts. Right above it though is the definition of BaseComponent which is slightly simpler but reveals something interesting - each component can be passed in a channel, and the default is None (more on channels in a bit). Additionally, we see BaseComponent is a subclass of a Manager. Components are managers, and managers are components.

Finkle is einhorn, einhorn is Finkle!

The metaclass does magical object construction stuff and method management/etc. Dragons.

So, the superclass has two addition methods - register() and unregister() which control the component's registration with the manager (more later). Otherwise, it doesn't have a lot else, except for the on_anevent() method we've added.

And made fancy with a decorator (mandatory pieces of flair)!

This code bit:

    @listener("anevent")
    def on_anevent(self):
        print "hello, I got an event"

Essentially means "this method reacts to this event" - so, let's look at the listener decorator a bit. The decorator is in core.py, and the docstring explains a lot more about the various arguments the decorator can take, but not very clearly. I guess you had to be there.

The one thing you notice though, is that the decorator adds some method attributes to your method - things like f.type, f.target, etc. And then something called f.br, the use of which isn't clear to me right now. You can however have a single handler listen for multiple events:

class TheComponent(Component):

    @listener("foobar", "anevent")
    def on_anevent(self):
        print "i listen for foobar and anevent"

note: James explained the .br attribute:

handler.br is used in the BaseComponent's send(...), and iter(...). It is used to determine which "branch" to follow. The args and kwargs of an Event are intelligently applied to an event handler depending on the signature of the event handler. (It took some time to get right - but basically it means you can't go wrong!)

In short - we now have a registered listener for an event named "anevent" - when an event is pushed (more on this later). More recently James altered it so that if you define a component, any method which doesn't start with a _ in the name will explicitly become an event handler of "listener" type and listens on a channel which is based off the method name.

We can refactor the original code like this:

class TheComponent(Component):

    def anevent(self):
        print "hello, I got an event"

And it works just as well - which is a nice improvement. Here's the note from James on this:

Basically, the new HandlersType (metclass) means that every method defined in a sub-classes Component that have not been previously defined as event handlers with the @listener decorator or do not start with a _, are automagically turned into event handlers listening on a channel that is the name of the method.

The use of the @listener decorator is still required for:

  • Defining filters
  • Defining event handlers listening on multiple channels.
  • Defining event handlers listening on foreign targets.

Next up is the main function - here we instantiate a new Manager object, which if you look at the code of this, you'll notice right off the bat that there's a startling number of __ method overloading. Which explains this line:

    manager += thecomponent

This is handled by the manager's __iadd__ method:

    def __iadd__(self, y):
        y.register(self.manager)
        if hasattr(y, "registered"):
            y.registered()
        return self

This means that when you append it onto the manager (something more explicit would be nice - brevity and terseness be damned) it calls the register function on the component. Reading through that method is actually quite telling as it introspects the current object (a component) and extracts the callable methods and then pulls out the handlers/channels for a given event and calls .add on the manager. A given component may have multiple listeners for a given event inside of a component:

class TheComponent(Component):

    @listener("foobar", "anevent")
    def on_anevent(self):
        print "i listen for foobar and anevent"

    def anevent(self):
        print "hello, I got an event"

These handlers are all registered with the manager. Now, something I want to clear up is that when you declare:

    @listener("anevent")
    def on_anevent(self):
        print "hello, I got an event"

The "anevent" defines the channel that listener listens on. It's easy to say "listens for events" - but really what this is is a subscription to a channel. This means you can define a listener with, well - no event:

    @listener()
    def all(self):
        print "i listen for everything"

It's like having all the premium channels.

Back to the manager though (the above was a light bulb going off in my head) - next we see that we "push" events into the manager. This is actually pretty simple:

    def push(self, event, channel, target=None):
        """E.push(event, channel, target=None) -> None

        Push the given event onto the given channel.
        This will queue the event up to be processed later
        by flushEvents. If target is given, the event will
        be queued for processing by the component given by
        target.
        """

        if self.manager == self:
            self._queue.append((event, channel, target))
        else:
            self.manager.push(event, channel, target)

This means we can push events into the manager, and target them at a specific component. The definition of an event in the case of a listener is actually the creation of a channel.

The push method exposes something else - it seems possible to create a manager which is actually a pointer to another manager (a proxy) alas, I don't see how to do that. Hooray!

So, we can create new events and push them into the manager which pipes them off to the channels who have assigned listeners. Easy! We flush the manager's queue so all events are pushed to the components, and we're done.

So what can we use these basics for? Easy - building components which stack on top of each other which contain other components which subscribe to a given event. In the examples directory, James has done an excellent job showcasing a lot of problems which he solves using the core circuits library. In fact, the examples explain a lot more about how things work than the core code itself.

One of the questions I had in working though all of this, is what happens if you define a general event - can the handler gain access to arguments within the event? Does it need to? Is it simply sufficient for a listener on a given channel to know the name/type of an event and react to it?

The answer is, well, yesandno. Let's say we do this:

from circuits import listener, Event, Component, Manager

class AnEvent(Event):
    "AnEvent Event"

class TheComponent(Component):

    def anevent(self, *args, **kwargs):
        print args, kwargs

def main():
    manager = Manager()

    thecomponent = TheComponent()

    manager += thecomponent

    for i in range(10):
        manager.push(AnEvent(1,2,3, no='yes', yes=False), "anevent")

    while True:
        try:
            manager.flush()
        except KeyboardInterrupt:
            break

if __name__ == "__main__":
    main()

You would see:

(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}
(1, 2, 3) {'yes': False, 'no': 'yes'}

Awesome. The arguments the event is created with are passed directly into the listener for that, so as long as you have the right signature on the method, you should be golden.

Which leaves us with a basic question - if messages have to be pushed into the manager on a given channel, how do we build something which is an event generator? In other words - how do we make something which "listens" and then generates the matching events.

To know that - we look in the lib/ directory (which the examples make prodigious use of) and we'll focus on io.py which listens on stdin:

from circuits import listener, Event, Component, Manager
from circuits.lib.io import Stdin

class DogBot(Stdin):

    def read(self, *args, **kwargs):
        print args, kwargs

def main():
    manager = Manager()
    dog = DogBot()
    manager += dog

    while True:
        try:
            manager.flush()
            dog.poll()
        except KeyboardInterrupt:
            break

if __name__ == "__main__":
    main()

All this does is create a new component which listens for the read event that Stdin generates and then prints off what we see coming in off the command line:

thumper:circuits jesse$ python2.6 dog.py 
a line is one argument
('a line is one argument\n',) {}

That means the read event needs to parse the passed in line and then (re)issue and event for some other listener to manage. Let's define some addition events:

class Sit(Event):
    """ Sit event, no args """

class Say(Event):
    """ Speak event
     args: what to say
    """

class DogBot(Stdin):

    def read(self, *args, **kwargs):
        command = args[0].strip('\n').split()
        if not command:
            return
        if command[0] == 'sit':
            self.push(Sit(), 'sit', self.channel)
        elif command[0] == 'say':
            self.push(Say(' '.join(command[1:])), 'say', self.channel)

    def sit(self):
        print 'i am now sitting'

    def say(self, words):
        print '%s' % words

Rudimentary - but you get the idea. For networking, you'd need to bind the socket and then read off/generate the events - this is largely covered by the sockets.py module in lib, as well as irc, webserver and smtp listeners.

All in all - it's pretty simple to construct components, it could be made a bit easier with less metaprogramming and more, well, methods but a great deal more documentation could help too. I'm not too fond of too much metaprogramming - I tend to think it makes code rather unapproachable in general.

The (event/channel)/subscription model used here is nice as well - you can easily create your own little asynchronous network daemon or something as simple as what I did above very quickly (once you know what's going on). It is obviously still evolving - James is putting a lot of work into it. If you're looking for a compact little library, this would be good to check out.


A gentle overview of Kamaelia or "it's axon, stupid"

by jesse in , , ,


Note:This is the first post in what I hope will be a series leading up to my concurrency/distributed systems talk at PyCon. I'm steadily working through experimenting with and learning the various frameworks/libraries in the python ecosystem. I reserve the right (and probably will) to revise these entries based on feedback from people (mainly the author(s) of said tool(s)). I will also add additional bits and pieces as I learn and explore more. Code and examples will be checked into my pycon 2009 bitbucket site here/Note

For awhile now, I've been meaning to dig into Kamaelia but was largely put off by what I tend to call the "twisted effect". What this means is that when I go looking for libraries and small components, I go looking for a library - not a "solution". I also worry about the "once you go in, you must follow this paradigm" effect. I'm not going to say that these feeling continue to be founded, or are completely rational - after all, I am digging into it, yes? It's the thought that once you adopt the "one true way of doing things" you're trapped in that solution/framework "forever" - ironically, I love Django for it's "conceptual integrity" and full-stack approach. No, I don't understand me either.

Also, as time has progressed, I have found that part of me yearns for a clean and simple to use "framework" (note the small "f") that would help build out a large system without introducing a lot of complexity. Ideally, that framework would allow me to swap components in and out - think of web frameworks like django and turbogears - in this case, instead of using stock localized IPC, I might want to swap in a simple messaging protocol (Pyro, XMPP, etc).

It's also a matter of marketing and approachability - things have improved on both websites mind you. Looking at Kamaelia's website though, I don't find it approachable, as it's not immediately clear what the core idea is, or what the difference between Axon (the core) and Kamaelia (the project) is. For example, if I had one critique, I would say that Axon should become it's own "project"/library in and of itself, and almost have it's own website. It would be like ripping Twisted's reactor out and making it a completely separate library.

Kamaelia, like Twisted, is based on a "simple" core - in this case, it's the Axon library which has some very simple goals and paradigms it seeks to fulfill. To quote the Axon page:

Axon is a component concurrency framework. With it you can create software "components" that can run concurrently with each other. Components have "inboxes" and "outboxes" through with they communicate with other components.

A component may send a message to one of its outboxes. If a linkage has been created from that outbox to another component's inbox; then that message will arrive in the inbox of the other component. In this way, components can send and receive data - allowing you to create systems by linking many components together.

Each component is a microprocess - rather like a thread of execution. A scheduler takes care of making sure all microprocesses (and therefore all components) get regularly executed. It also looks after putting microprocesses to sleep (when they ask to be) and waking them up (for example, when something arrives in one of their inboxes).

This by itself is the shining gem of the Kamaelia ecosystem - everything else is applications or additional utilities built on this simple core. This is where the website confusion comes in - where does "solutions built with axon (e.g. kamaelia)" end and Axon begin? The core design (of Axon) is very simple though: build a component which communicates via message passing.

Message passing is a relatively simple concept. Component A generates some work, and then sends it to Component (Not A). Messages are handled by the receiver and results can be passed (via a message) to someone else.

Very, very simple. You can add on little factoids about the fact that messages sent and received are handled in asynchronous fashion, messages can be sent locally - or across a wide network, etc - but largely those are component implementation details.

Which gets us back to Axon.

Since I'm interested in the core - and not video transcoders - I hit up the MiniAxon tutorial here and worked through it - even then, I don't really think it did Axon complete justice. I then jumped into the "How to write new components" article by Michael.

The second tutorial, in my humble opinion, should be the first article users are directed to, while it has some polishing issues, I found it to really explain what the fruit was going on - and what Axon is.

Reading through both of these, you begin to realize that Axon is built on the core concept of Python generators and yielding control to a scheduler. For example:

def sender():
   sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
   sock.bind((ANY,SENDERPORT))
   sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_TTL, 255)
   while 1:
      time.sleep(0.5)
      sock.sendto("Hello World", (MCAST_ADDR,MCAST_PORT) );
      yield 1

If you're familiar with python generators, you know that what this function does is send a message to a socket, and then hand control back to the main program. It will do so forever until the controller exits.

This paradigm is key to Axon: components send and receive messages via mailboxes (by far, one of the best descriptions/abstractions I've seen for this) - the components do the work sent/generated and then put it in the right outbox, and then yield control, courtesy of Enhanced Generators (see PEP 342)

Yes, coroutines/greenlets/tasklets - stop bothering me.

In the tutorial I linked above, Michael takes the very simple network script and ports it to Axon. Here's my simple experiment that drops the networking code and cuts to the mailbox system. In this case, all I want to do is send and receive the lyrics to the meow mix commercial, forever:

import Axon

LYRICS="I want chicken I want liver Meow Mix Meow Mix Please Deliver."

class Producer(Axon.Component.component):
    def main(self):
        while 1:
            self.send(LYRICS, "outbox")
            yield 1

class Sender(Axon.Component.component):
    def __init__(self):
        self.__super.__init__()

    def main(self):
        while 1:
            if self.dataReady("inbox"):
                message = self.recv()
                self.send(message, "outbox")
            yield 1

class Receiver(Axon.Component.component):
    def __init__(self):
        self.__super.__init__()

    def main(self):
        while 1:
            message = self.recv()
            print message
            yield 1

def tests(): 
    from Axon.Scheduler import scheduler 

    class testComponent(Axon.Component.component): 
        def main(self): 
            producer= Producer()
            sender = Sender()
            receiver = Receiver() 

            self.link((producer, "outbox"), (sender, "inbox"))
            self.link((sender, "outbox"), (receiver, "inbox"))
            self.addChildren(producer, sender, receiver)
            yield Axon.Ipc.newComponent(*(self.children)) 
            while 1: 
                self.pause() 
                yield 1

    harness = testComponent() 
    harness.activate() 
    scheduler.run.runThreads(slowmo=0.1) 

if __name__=="__main__": 
    tests()

Now, this uses knowledge from both tutorials, and the Axon.Component documentation. The component documentation can be hard to find, and not so easy to navigate too.

If we break one of the classes/components down and look at the "magic" provided by the component subclass, it gets clearer - in this case, I've "added" back in the methods from the component superclass, minus the doc strings:

class Sender(Axon.Component.component):
    # First, subclass the Axon Component class, this provides us with the
    # basic inbox/outbox static members that look like this:
    Inboxes = { "inbox"   : "Send the FOO objects to here",
                "control" : "NOT USED",
              }
    Outboxes = { "outbox" : "Emits BAA objects from here",
                 "signal" : "NOT USED",
               }

    def __init__(self):
        self.__super.__init__()

   def recv(self, boxname="inbox"):
      # returns the first piece of data in the requested inbox.

      return self.inboxes[boxname].pop(0)

   def send(self, message, boxname="outbox"):
      # appends message to the requested outbox.

      self.outboxes[boxname].append(message)

   def dataReady(self,boxname="inbox"):
      # Returns true if data is available in the requested inbox.

      return self.inboxes[boxname].local_len()

    def main(self):
        while 1:
            if self.dataReady("inbox"):
                message = self.recv()
                self.send(message, "outbox")
            yield 1

I think this makes it abundantly clear what's happening with the method calls on this class. Now, there's the additional magic of the new tests method - in which we defined a new component, which was actually a component containing and linking the pipelines (connections between mailboxes) between the other components.

Now, the Axon.Ipc.* docs aren't the most helpful - in our case, we called:

    self.link((producer, "outbox"), (sender, "inbox"))
    self.link((sender, "outbox"), (receiver, "inbox"))
    self.addChildren(producer, sender, receiver)
    yield Axon.Ipc.newComponent(*(self.children)) 

Within the main method of a component. We need to look at the link method on the superclass:

   def link(self, source,sink,*optionalargs, **kwoptionalargs):
      """\
      Creates a linkage from one inbox/outbox to another.

      -- source  - a tuple (component, boxname) of where the link should start  
                   from
      -- sink    - a tuple (component, boxname) of where the link should go to

      Other optional arguments:

      - passthrough=0  - (the default) link goes from an outbox to an inbox
      - passthrough=1  - the link goes from an inbox to another inbox
      - passthrough=2  - the link goes from an outbox to another outbox

      See Axon.Postoffice.postoffice.link() for more information.
      """
      return self.postoffice.link(source, sink, *optionalargs, \
                                  **kwoptionalargs)

This leads us to the Postoffice class which actually constructs and tracks the links between the components. Here there be dragons.

So, addChildren just registers all of the passed in component instances as children of the newly constructed components (components, all the way down), and then we yield ourself - if you added a 'print harness' you'd see:

Component __main__.testComponent_5 [ inboxes : {'control': [], 'inbox': []}
outboxes : {'outbox': <>, 'signal': <>}

This means we're getting a component back, and then calling .activate on it - .activate is actually a method on the Axon.Microprocess.microprocess class. In our case, it (it being activate) is simply registering our test component (which contains all of the children) with the default scheduler.

At which point, we call scheduler.run.runThreads jazz hands.

I dived into some of the internals here, I ended up having to supplement the documentation with pouring through the code - but I personally think it helps clear things up to remove some of the magic and show what is actually occurring. Most of the time, it seems you simply won't care - and instead you'd just make and register your happy component and be on your way.

To somewhat summarize what we're lookin at - a component that subclasses the default Axon.Component uses generators to yield control back and forth, passing messages/work to and from each other via the very clever mailbox/postoffice metaphor.

Now, the interesting thing, once you start digging through things is that your component isn't a really a Thread - if you wanted to make sure each component was in its own thread, you might instead subclass Axon.ThreadedComponent.

Subclassing this new class looks mighty close to what we did before but instead Uses threads and queues for the message passing. Instead of yield, you just run, and the recv/send methods are backed by queues. Ahh, delicious non judgmental queues.

In any case, Kamaelia - via Axon, is a very nice abstraction on top of a very simple concept - message passing for concurrency. The fact that you can quickly build up a series of components which pass work back and forth via some sort of communications system and not have to worry about the underlying nuances/organization is quite nice.

One of the things Michael Sparks and I have talked about is adding some level of multiprocessing support for Kamaelia - this would actually be insanely easy if I used Axon.ThreadedComponent as the template, but instead used multiprocessing.Queue and multiprocessing.Process as the back end.

Kamaelia itself, is really a series of example components/applications which build on a core (Axon), but you do not need Kamaelia to use Axon effectively. In fact, just on a whim, I decided to whip up a dirty http load tool in Kamaelia. You can see it here.

It's really a hack job - all I did was build off the meowmix demo, swapped in the threaded component and hacked it around a bit. One thing I'd like to know is how to pass in a dynamic number of clients so that I could create the outboxes dynamically in the producer - there wasn't anything clear in the docs to allow me to do this. Also, it doesn't shut down.

I'm going to keep hacking around with Axon, it's pretty neat. Interesting things I'd like to poke around in:

  • Replace the underlying IPC mechanism with multiprocessing.Pipe/Pyro/posix_ipc
  • hack on the mp version of the component backend
  • get a multi-system script running and communicating workloads across the LAN

Actors, concurrency and Kamaelia

by jesse in , ,


Recently, I made an offhand comment here about:

I've actually started thinking about/sketching an actor model build on top of MP, using concepts from actors/monitors and things in the ecosystem today

The ensuing comments and discussion were pretty good - but last night Michael Sparks (of Kamaelia) posted a darned nice comment:

Are you aware that a complete mini-axon using generators is tiny and the rest is optimisations and extra stuff that you find useful in real world systems? By tiny, I mean this small:
* http://www.kamaelia.org/MiniAxonFull

A mini-axon using processes would be equally lightweight (shorter probably) and pretty awesome.

Also, it's easy to confuse the two halves of Kamaelia. If you think of Kamaelia as just an actor-type implementation, then it's actually more an actor-like implementation, with STM & an internal SOA system of just over 2000 lines (which is how big Axon actually is, excluding comments & docs), with 80,000 lines of examples...

However, personally I view it as a mechanism for building components which happen to be best used in a concurrent fashion. ie rather than viewing it as "a mechanism for using concurrency", I view it as "OK, assume we have concurrency, how can we use this to assist in building and maintaining systems". Axon also gives you the tools for taking these concurrent systems, and interfacing between concurrent systems and standard code. (http://www.kamaelia.org/AxonHandle)

As a result, I view Axon as a library which provides you with the tools wrapping up idioms useful for building collections of components which be a framework.

Anyway, potato/potato, tomato/tomato - if you like, you like, if you don't, you don't.

I'd love to replace our existing process based stuff btw with a multiprocessing based version though. If I was going to go down this route, I'd follow our mini axon tutorial to do so, largely becauseit's essentially the starting point I took with the multiprocess stuff recently and it worked out pretty well.

Beyond this basic stuff though, I've noted that people generally start talking about co-ordination languages and building up pattern repositories. The interesting intersection between these two which you get if you call things components rather than actors is it becomes natural to create components called a chassis. These chassis often instantiate directly in concrete usable form concepts that you'd normally refer to as a pattern - Pipeline, Graphline, Carousel, Backplane, Seq, TPipe, etc.

On a random note, you may want to check out MASCOT "Modular Approach to Software Construction, Operation and Test". I heard about it late last year, and it appears to have the same sort of architecture as Kamaelia. Interestingly (to me) it makes the same key decision - when you send a message outside your component, you don't know who is going to receive it. This then enables (and requires) a higher level system for connecting components together. The upshot is highly reusable components. This doesn't entirely surprise me - my ethos came from recognising that asynchronous hardware systems & network systems look strikingly similar... (cf http://www.slideshare.net/kamaelian/sociable-so...)

Anyway, reference for MASCOT: http://async.org.uk/Hugo.Simpson/ - skip down to the end of the page for this PDF: http://async.org.uk/Hugo.Simpson/MASCOT-3.1-Man... I was really pleased to be pointed at MASCOT, largely because it showed a large number of other domains where the same basic model has been used for well over 30 years... Just with non-existent exposure, and slightly different metaphors. Though we, like it, also have mechanisms for automatically visualising systems, with a 1:1 correspondence. Beyond that this also gives us a model that matches Edward Lee's "The Problem with threads" - we'd released running code long before that paper was published :-)

Anyway, I'm glad that you're looking at what we've done. If you use it, that'd be great, and I'll happily merge anything you'd like to have a life. (the only comment I'd make there is metaphors and accessibility count - this is surely the point of python? :-) If you don't take what we use etc but it helps you solidify your thoughts to "No, I don't want that, I want this", then likewise, I'm equally glad. If you do that, I'd love to know what you do try, since I like to merge best practice concurrency ideas into Axon :-)

I'd *REALLY* suggest looking at MASCOT though. Really made my Christmas last year when I was pointed at it.

We're currently having lots of fun using concurrency, primarily by allowing it to make our lives easier, and forgettable about :) It'd be nice to see something similar on top of multiprocessing (which we'll do if you don't, but it'd be great if you did - but I'd understand if your view was that you prefer a pure actor (sender knows receiver) model.

Originally posted as a comment by Michael Sparks on jessenoller.com comments using Disqus.

I wanted to pull that comment out and showcase what Michael has to say, and in some way, respond. First, yes - I am looking at Kamaelia (from here on out, I'm going to call it "Kam" - I keep transposing the ae). I actually ran through the mini-axon tutorial, and when I have time, I'm trying to tease apart the internals to better understand it.

Fundamentally, I agree with you (Michael) about the aspects of making concurrency easier (and safer). Right now, I think Kam is a pretty darned good start - for a framework *grin*.

When I made the comment I made, I didn't think it would get the response it got. There has been a dog-pile of discussions about concurrency best practices/etc and in fact, there's a discussion still going on on the python-list about concurrency stuff going on right now.

Personally, I only work on the concurrency stuff part-part time - this includes my minor work on Python-Core. My day job is a test engineer - while I am building highly concurrent (and distributed) tests and I use multiprocessing and threading daily, it's not my full time pursuit. I am passionate about it, and I am passionate about improving python as a language, and library - and if I can do it as a day job and open source it, or get company time to do it, by golly I will.

When I said "I want to build an actor model" - I was not necessarily talking about doing an implementation for python-core. I'm a big believer in learning-through-implementation - so when I said "build x", not only did I mean "build something for the world" - I also meant "build something for my own benefit" so I can deep-dive into the concepts, problems etc.

This is why I am adverse to jumping in and simply "using" a framework - not because I don't think it does something exceedingly well - trust me, I am an opportunistic developer - if I can find a library that does what I need *right now* - I'll use it.

That being said - I am exploring Kamaelia, and yes hopefully I can steal some time to actually do an implementation of the process-based stuff with multiprocessing. I want to explore everything in the ecosystem today - my discussions with Adam Olsen around this stuff (and around python-safethread) and others has made me really want to explore solutions that help everyone, and take the best ideas and concepts and rolls it into something worthy of Python core.

As I have said before - I believe there is room within Python as a language, and CPython as an implementation of that language - for a virtual buffet of helpful libraries for concurrency and distributed systems. Right now, we have threading, async* and multiprocessing. There is plenty of room to grow. Maybe one day I can steal time to grab more of the concepts from java.util.concurrent and propose them via a PEP. Heck - maybe we can work as a group to propose an actor/monitor implementation for Python-Core.

So - personally, and by way of responding to Michael in a more concrete way: I'm, personally looking at anything I can to learn more about implementations and strategies. If something were to come out of it, and I felt strongly enough to propose inclusion in core, I would write and post a PEP - and not run in blindly.

I've got more than enough stuff to work on in addition to the day job and being a Dad, and yard work. Oh and the day job, which doesn't involve nearly as much distributed-and-concurrent systems in Python as I'd like :).

Heck - thinking about it we'd need a good messaging implementation too. I'll put that on the pile too.

Quick Rant (slightly off topic):

Also, can we stop talking about the damned GIL? Yes, you need locks, No, you probably don't care about the GIL. Stop yammering about how "broken" CPython is because of it- CPython is an implementation, not the final one and not the only one. If the GIL really gets you excited, either drop to a C module, use multiprocessing or something else. The GIL is here to stay for some time - either propose a PEP (and a Patch) that doesn't break CPython or hush. Enough bike-shedding - discussion is great, especially when something comes out of it, but constantly berating/lamenting things is just a bike shed. The shed is purple, now move on. Purple!

Required Reading ( in addition to Michael's links):