Twisted — hello, asynchronous programming

February 11th, 2009 § 5 comments

Note:This is the third post in what I hope will be a series lead­ing up to my concurrency/distributed sys­tems talk at PyCon. I’m steadily work­ing through exper­i­ment­ing with and learn­ing the var­i­ous frameworks/libraries in the python ecosystem.

I reserve the right (and prob­a­bly will) to revise these entries based on feed­back from peo­ple (mainly the author(s) of said tool(s)). I will also add addi­tional bits and pieces as I learn and explore more. Addi­tion­ally, thanks to glyph for giv­ing me a hell of a lot of feed­back./Note

Twisted is the 800 lbs gorilla of the “con­cur­rency” frame­works. It’s been around for awhile, has a large fol­low­ing — it’s used by every­one from Apple (iCal server) to Build­bot Build­bot. It has a lit­eral ton of sub projects and other “semi attached appendages”.

Twisted can be daunt­ing for almost every­one — while it is con­cep­tu­ally sim­ple, the docs and exam­ples could be more approach­able. Peo­ple look at Twisted as an all-or-nothing bet when con­sid­er­ing it for their appli­ca­tions, which to an extent, it is.

But if I am going to do a concurrency/distributed sys­tems talk — I can’t ignore one of the most widely used and orig­i­nal frame­works in this space.

So, as always — these are my semi-rough notes div­ing into Twisted-core. Like Kamaelia, I am going to side step the aster­oid ring which sur­rounds Twisted (or ten­ta­cles… I can’t decide which to use) and delve into the core.

Mov­ing along — Twisted is based around asyn­chro­nous pro­gram­ming — a model for adding a great deal of con­cur­rency to your appli­ca­tion via non block­ing calls or iso­lat­ing block­ing calls “else­where”. This is largely the same approach that GUI toolk­its use wherein a given event is assigned some­thing to run when an “event” occurs, such as a but­ton click or data is avail­able. Glyph sent me a very sim­ple side-by-side:

?View Code PYTHON
1
    reactor.listenTCP(8080, someFactory)
?View Code PYTHON
1
    button.connect('clicked', someCallback)

This shows the essen­tial aspect of asynchronus/event-driven sys­tems — you tell x that when y occurs, call z. It really is con­cep­tu­ally sim­ple. The main loop of the appli­ca­tion sim­ply focuses on con­struct­ing these rela­tion­ships, and exe­cut­ing the call­backs when the event occurs.

Twisted’s focus is on network-based appli­ca­tions — these mesh well with the idea of non-blocking I/O, where you have to fun­da­men­tally chunk your work up into small pieces which take a very short time to exe­cute. Net­worked apps spend most of their time wait­ing for data to come in over the wire. Twisted also has fac­ul­ties to iso­late CPU inten­sive and/or block­ing calls within threads or processes.

I’m going to focus on two of the core com­po­nents — Deferreds and the Reac­tor, this should help illus­trate what the core par­a­digm is.

Deferreds are a core com­po­nent of Twisted — a deferred in the sim­plest terms is an object that when cre­ated, rep­re­sents some Thing which will even­tu­ally return Some­thing or Error — a place­holder for some­thing in the future. The way you han­dle Some­thing or Error is you tell the deferred that if it gets Some­thing, it should call consume_function, oth­er­wise — if it gets an Error, it should call error_function. Easy!

Take the below for exam­ple — read_mail and error_function are what’s known as call­backs — a func­tion which is called by some­thing else when some­thing occurs (I hate me for writ­ing that).

Call­backs are really sim­ple, as illus­trated here:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import os, sys
def read_mail(mailitems):
    print mailitems
    sys.exit()
 
def read_error(error):
    raise Exception('error: %s' % error)
 
def wait_for_mail(callback, errback):
    while True:
        try:
            mail = os.path.isfile('mail')
            if mail:
                callback(open('mail','r').readlines())
            else:
                pass
        except Exception, e:
            errback(e)
 
wait_for_mail(read_mail, read_error)

Side­note: Call­backs are stu­pid easy with python. I love them. Heck, you can pickle a call­back and shoot it over the wire to another machine. Call­backs are cool. You should watch Alex Martelli’s call­back talk on youtube.

In any case, you tell a deferred (the object which rep­re­sents a promise of some­thing) what to do when data is returned — you do this by gen­er­at­ing a deferred, and then adding call­backs onto it — note that the func­tion wait_for_mail needs to return a deferred. In this toy exam­ple, I want to just look for a “mail file” on disk, and then if it exists, return a string to the callback:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import os
from twisted.internet import reactor, defer
 
def read_mail(mailitems):
    print mailitems
    reactor.stop()
 
def wait_for_mail(d=None):
    if not d:
        d = defer.Deferred()
    if not os.path.isfile('mail'):
        reactor.callLater(1, wait_for_mail, d)
    else:
        d.callback(open('mail','r').readlines())
    return d
 
deferred = wait_for_mail()
deferred.addCallback(read_mail)
reactor.callLater(60, reactor.stop)
reactor.run()

I wanted to keep this as sim­ple as pos­si­ble, as it illus­trates some things that fun­da­men­tally tripped me up at first.

First typ­i­cally, if you were to solve a prob­lem — say, polling a mail­box, you might do this:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import os, sys, threading, time
def read_mail(mailitems):
    print mailitems
 
def wait_for_mail(reader):
    while not os.path.isfile('mail'):
        time.sleep(.1)
 
    reader(open('mail','r').readlines())
 
t = threading.Thread(target=wait_for_mail, args=(read_mail,))
t.start()
t.join()
sys.exit()

Or some other pat­tern of spawn­ing a thread and then wait­ing for that thread to chuck the data back to you. The thread gets out of your way, and doesn’t force the main part of the pro­gram to block, sort of. In fact, with Twisted, the main part of your appli­ca­tion is required not to block.

As a point of order, you could do the same thing with Twisted like this:

?View Code PYTHON
1
2
3
...
d = deferToThread(wait_for_mail)
...

Threads are the com­mon way of push­ing block­ing work off to the side — most of us have had to do it at one point or another. In the Twisted world, this has to be turned on it’s head a bit.

In Twisted, you have to split your prob­lem into small, indi­vid­ual functions/methods — ide­ally, you iso­late the really block­ing part (say, wait­ing for a file to appear) in it’s very own func­tion. You make the non block­ing parts — say check­ing ini­tial exis­tence and con­struct­ing the deferred object run as quickly as pos­si­ble, and then return the deferred — a promise of data to come via the slow method. The slow part in the thread­ing exam­ple is the time.sleep().

You then sched­ule that block­ing call to run, it shouldn’t block, but rather it should poll for data or changes in it’s buffer(s) and either resched­ule itself to run if there is no data, or return the data if there is. This event is time based — but the same applies to adding a call­back to a non-time-based item, such as set­ting up some­thing which lis­ten on a socket for data.

The fact you sched­ule work within the reac­tor tripped me up at first. I was think­ing in block­ing terms though (Twisted has a fac­ulty for pass­ing block­ing work off to threads via the defer­ToThread call) — the func­tion kept want­ing to block instead of polling, or look­ing for state change. Every­thing needs to be sched­uled in one way or another.

Glyph wisely pointed out that this is a com­mon issue with peo­ple rethink­ing con­cur­rency in terms of “dis­crete events”. For exam­ple, most peo­ple are con­tent to think about con­cur­rency in terms of work­ers “who are off doing things”. In the­ory, those work­ers are “always doing some­thing” — in real­ity, the oper­at­ing sys­tem is sim­ply sus­pend­ing your worker(s) until an inter­rupt (i.e.) dis­creet event occurs, which causes the worker(s)/app unblock.

Glyph sug­gest the fol­low­ing as a good exam­ple using gen­er­a­tor syntax:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from twisted.internet.defer import inlineCallbacks, returnValue
from twisted.internet.task import deferLater
from twisted.internet import reactor
 
def deferredSleep(howLong):
    return deferLater(reactor, howLong, lambda : None)
 
@inlineCallbacks
def wait_for_mail():
    while True:
        if os.path.isfile('mail'):
            returnValue(open('mail','r').readlines())
        yield deferredSleep(1.0)
 
@inlineCallbacks
def check_mail():
    mail = yield wait_for_mail()
    print 'got mail', mail

The exam­ple he pro­vided is inter­est­ing — it has no “sched­ul­ing” involved, and instead it uses some­thing I didn’t see orig­i­nally — defer­Later and inlineCall­back, inlineCall­back accepts a func­tion as an argu­ment, that func­tion can yield a def­fered or call return­Value, essen­tially any­where where you would nor­mally block, you sim­ply yield. In this case, we sim­ply call defer­Later if the file doesn’t exist, which tells the reac­tor to re-run this some time in the future.

Before I move off the basic view of deferreds — there’s some­thing to note, deferreds can accept a chain of callbacks:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import os
from twisted.internet import reactor, defer
 
def read_mail(mailitems):
    print mailitems
    return "this %s is junk" % mailitems
 
def shred_mail(mailitems):
    print 'buzzzzz: %s' % mailitems
    reactor.stop()
 
def wait_for_mail(d=None):
    if not d:
        d = defer.Deferred()
    if not os.path.isfile('mail'):
        reactor.callLater(1, wait_for_mail, d)
    else:
        d.callback('letter')
    return d
 
deferred = wait_for_mail()
deferred.addCallback(read_mail)
deferred.addCallback(shred_mail)
reactor.callLater(60, reactor.stop)
reactor.run()

And the output:

letter
buzzzzz: this letter is junk

This allows each call­back chained onto a deferred to alter the data in some form — the mod­i­fied data is passed to the next call­back in the chain. It’s not really magic. It’s sim­ply a series of func­tions to call when an event occurs.

One thing to keep in mind when think­ing about Twisted — Twisted is sin­gle threaded. Ok, not really — sort of. Glyph called me out on this, and rightly so. In real­ity all I/O in Twisted is single-threaded, most Twisted APIs are not thread safe and deferred call­back chains exe­cute in a sin­gle thread… But — Twisted does sup­port both threads and processes. There is no rea­son why you can’t use a library which uses threads in your twisted appli­ca­tion for exam­ple. You can use threads with twisted, so don’t worry too much about that.

On the other hand, if your entire appli­ca­tion is a thread-spawnfest, you might want to recon­sider — the design of your app, that is. /flamebait

All code exe­cutes in the main thread of a sin­gle python process, which is why when dis­cussing deferreds and block­ing calls, it is so impor­tant to break your prob­lem down into the small­est steps pos­si­ble and isolate/rework block­ing code into deferred actions.

Onto the reac­tor then.

The reac­tor is the event loop mech­a­nism for Twisted. It takes care of exe­cut­ing all of the var­i­ous timed actions and the exe­cu­tion of the callback/errback stack. Timed actions can be deferreds, etc. Deferreds are sim­ply objects exe­cuted by the Reactor.

You’ll notice in the exam­ple above, we didn’t cre­ate an instance of the reac­tor, instead we just imported it. If you look at Twisted.internet.reactor — you’ll see this removes any pre­vi­ous instances of reac­tor in sys.modules and then calls install() on the tar­get reac­tor. This means the reac­tor is a sin­gle­ton, all future imports/calls will always refer to this reactor.

Now, the doc­u­men­ta­tion in internet.reactor men­tions that new appli­ca­tion code should pass around an instance of a reac­tor — this moves away from the reactor-as-a-singleton (the sim­ple behav­ior) and into some­thing a bit more inter­est­ing. Glyph pointed out the obvi­ous use­ful­ness for this — testa­bil­ity, you can pass in a reac­tor rather than grab­bing it from the global, you have more control.

You can also use this to group a series of actions both timed and oth­er­wise, connections/etc within a given reac­tor, and then cre­ate a meta-reactor to con­trol the mul­ti­ple reac­tors. Reac­tors, all the way down.

See, Twisted has mul­ti­ple types of reac­tors — there are reac­tors based on select (the default), GTK, Cocoa (PyObjc), etc. Each reac­tor man­ages the sched­ul­ing and exe­cu­tion of the tasks added to it in it’s on unique style, but imple­ment a com­mon inter­face. Appli­ca­tion code should not care what reac­tor is run­ning — the reac­tor is an abstrac­tion above other, some oper­at­ing sys­tem opti­mized polling/looping mech­a­nisms. For exam­ple, GTK+ loop­ing and polling, select on Linux/etc.

Quot­ing Glyph:

The idea of all this is that Twisted code is at the top of the food chain.
GTK+ net­work­ing code can only run in GTK+ pro­grams, kqueue net­work­ing code can
only run on FreeBSD machines, but Twisted net­work­ing code can run any­where it
can get its hands on some­thing that looks even vaguely like select(). If you
want a Win­dows desk­top pro­gram and a UNIX server pro­gram to run the same
net­work­ing code, but have rad­i­cally dif­fer­ent event APIs under the cov­ers for
good per­for­mance, Twisted has you covered.

You can install a given reac­tor by doing the from twisted.xxx.reactor_name and call­ing the install() method (which is all twisted.internet.reactor does). The pre­ferred method is to use the “–reac­tor” argu­ment to the twistd/trial tools.

Twisted is fun­da­men­tally a net­work­ing stack: it’s built to solve non com­pu­ta­tion­ally inten­sive tasks that require a lot of wait­ing and polling for data: net­work­ing fits per­fectly into this. With addi­tions — it can also serve as a plat­form for com­pu­ta­tion­ally intensive/distributed appli­ca­tions. For exam­ple, the ampoule add on.

In the net­work­ing sense: Twisted is per­fect as long as you can decon­struct and rethink the way you solve day to day prob­lems, remove or rethink block­ing code, switch your appli­ca­tion model to some­thing event/message driven.

Admit­tedly; Twisted isn’t for me (right now) — but with time, it could be, and it could work great for your appli­ca­tion today. It has libraries for just about any pro­to­col you could pos­si­bly ask for. It’s code base is huge and actu­ally has some pretty cool code inside of it. The catch is — you don’t port an appli­ca­tion to Twisted: You write a Twisted appli­ca­tion in most cases.

How­ever; it can also be hard to approach — both within the code, and the doc­u­men­ta­tion. You see ques­tions pop up all the time “I think Twisted does this” — and while it prob­a­bly does it could take you awhile just to grok what it is to be Twisted.

For exam­ple — dumb down the exam­ples. While the fin­ger tuto­r­ial is “the intro­duc­tion” start­ing down on a level where you take a nor­mal, sin­gle threaded appli­ca­tion (per­haps using gen­er­a­tors), port it to threads, point out the issues there, and then port it to twisted, using as lit­tle “twisted magic” as possible.

This isn’t to say it’s impos­si­ble to grok; but help­ing walk peo­ple through learn­ing how to begin to think asyn­chro­nously, rather than explain­ing the eso­teric or the One True way of doing some­thing, con­sider the pos­i­tive feed­back that Steve Holden’s “Teach Me Twisted” got (sum­mary here). Assume most peo­ple can’t spell asyn­chro­nous, and build it up.

The Django doc­u­men­ta­tion is prob­a­bly one of my favorite exam­ples of this — it starts very sim­ple and very approach­able and walks the user through every­thing “from the beginning”.

Related Links

What's this?

You are currently reading Twisted — hello, asynchronous programming at jessenoller.com.

meta