Have GIL: Want Benchmarks.

September 12th, 2007 § 9 comments

IMG_0869.JPG44 So, with the recent fur­vor over the GIL, one of the things GvR asked for was for some­one to pro­vide some bench­mark num­bers of Sin­gle v. Multi v. Other in var­i­ous tests.

I started work­ing on this a night or so ago, and a lot of things have fallen out of it — not the least of which is my burn­ing desire to really look at the GIL, thread­ing in python and the ecosys­tem of alternatives.

A thing of Note: I’ve spawned an ini­tial google code project to explore the alternatives/benchmarks. Read about it here.

The first test script I wrote was a sim­ple one: take a func­tion which cal­cu­lated a num­ber of fibonacci num­bers and run it in a loop with larger and larger sets. I then:

  • Ran it twice, synchronusly.
  • Ran it twice, once in each thread (2 threads).
  • Ran it twice, but used the pro­cess­ing mod­ule to spawn to processes, run­ning it once in each.
  • Ran it twice, but used the par­al­lel python mod­ule to spawn a pool of work­ers for 2 proces­sors, run­ning it once in each.

Yes. I know that fib cal­cu­la­tions are a processor-only activ­ity, and that’s all I wanted to do at first. I want to add tests for file access, net­work access, and I want to run the tests (as allowed by the exten­stion mod­ules) in PyPy, Jython and Iron­Python44.

In the long run — I want to make some­thing of a test suite to not only demo the var­i­ous methods/alternatives but to also to work towards a future where some­thing like par­al­lel python or the pro­cess­ing mod­ule can make it into the stan­dard python library.

Not to men­tion, for those look­ing for alter­na­tives to basic thread­ing in python (or more infor­ma­tion on thread­ing in python). Yes, things that have to be taken into account:

  • Shared vs. unshared mem­ory and state.
  • Ease of use/API
  • Amount of Fun.
  • Can the solu­tion spread across machines?
  • Is it, or can it be, use­ful for web appli­ca­tions? (i.e: can it be used to spread load across processors)

If you have a test or a sug­ges­tion — drop me an email or post a com­ment. Heck, this might just turn out to be a learn­ing expe­ri­ence for me or it could turn out to be some­thing bigger.

Update: Wow, you guys are awe­some. The pri­vate and pub­lic com­ments I’ve got­ten on this have left me with a lot to think about — and hell — a lot to learn. I’m still in the toying/planning phase, so please keep feed­ing me infor­ma­tion. When I get more time this week­end (if I get time) I will try to put up some­thing cohe­sive, includ­ing a sub­ver­sion repos­i­tory and a maybe a wiki.
Update 2: Hello Red­dit.4

  1. Yes, this is a gratu­ti­tous pic­ture of my daughter444
  2. Yes, Daniel Watkins has started this process as well444

4

  • http://www.mechanicalcat.net/richard/ Richard Jones

    Inter­est­ingly enough, I’ve just imple­mented a GUI-based file down­loader using both thread­ing and pro­cess­ing on OS X on a G4 (sin­gle core). Using urllib2 to down­load (read­ing chunks of 1MB at a time) I found that the thread­ing imple­men­ta­tion played bet­ter with the GUI (with some semi-icky hacks) than the pro­cess­ing approach.

    Both approaches had the down­load worker writ­ing directly to disk, so the traf­fic back from the worker to the par­ent was min­i­mal (a “size received” mes­sage after each read()).

    This was down­load­ing from a LAN server, so the down­load speed was roughly 10MB/s.

    The semi-icky hack in the thread­ing approach: I had to intro­duce a sem­a­phore in the worker main loop which only allowed the worker to “fire” once per every two GUI update loops, oth­er­wise the GUI was starved for CPU.

    So for me it wasn’t totally about how to make things go the fastest, but also about how to best share the CPU in an inter­ac­tive application.

  • Steve

    Have read blog post: want your results.

  • http://www.jessenoller.com jesse

    I will be post­ing the results + code once I get all my ducks in a row. I also want to pass it by a few peo­ple to ver­ify I didn’t do some­thing mon­u­men­tally screwy.

  • Miguel Sousa Filipe

    I agree with you that alter­na­tives must be mea­sured and tested.

    Check out this other blog post about test­ing some­thing sim­i­lar with what you did:
    http://blogs.warwick.ac.uk/dwatkins/entry/benchmarking_parallel_python_1_2/
    Spec­cially, have a lot at what Mr. Guido Van Rossum said. (9th comment)

    I’ve also sent you a email elab­o­rat­ing more on my thoughts about this whole “think­ing, test­ing and choos­ing” a alter­na­tive “parallel/concurrency” API for python.

    Best regards,

    Miguel Sousa Fil­ipe
    miguel dot-thingie0 fil­ipe at-thingie1 gmail dot-thingie2 com

  • http://www.jessenoller.com jesse

    I got the email and replied, I also linked to Watkins’ blog post in mine, I will be fol­low­ing up with him as soon as I have a moment.

  • http://twopieceset.blogspot.com Nick Gerner

    Thanks for tak­ing the time to ver­ify results. This is the kind of thing you (or me, or any­one) could screw up really eas­ily. And it’s impor­tant we get solid results on this issue.

  • Michael Dil­lon

    Please include Pyro in your tests: http://pyro.sourceforge.net/

    Since Par­al­lel Python uses thread­ing inside the mod­ule, it really isn’t a pure non-threading solu­tion. If you set up a sim­i­lar test with Pyro using two worker processes on one machine, and a process on another machine to con­trol it all, that would com­pletely remove thread­ing from the equa­tion. It may turn out to have a min­i­mal dif­fer­ence from Par­al­lel Python but that is the kind of thing we want to learn from benchmarking.

    Also, I think that the bench­mark should be struc­tured so that the mas­ter thread/process does a fair amount of com­mu­ni­ca­tion, i.e. to sup­ply the para­me­ters for each loop. That way you can run the test in two ways, first with the Fibonacci com­pu­ta­tions, and sec­ond with a null com­pu­ta­tion. The sec­ond style of test will only mea­sure the over­head. The­o­ret­i­cally, the vari­ance in over­head would be the only dif­fer­ence between the Fibonacci runs, but, as always, you need to run the bench­marks to learn this.

    Once you have it all run­ning, please dis­trib­ute it so that peo­ple can try it on dif­fer­ent CPU and OS types. Bench­marks on a sin­gle CPU or OS type can often be deceptive.

  • Eli

    I’m really glad you’re doing this. Python def­i­nitely needs good libraries for run­ning mul­ti­ple processes on dif­fer­ent cpus, and your bench­mark­ing efforts will def­i­nitely help this happen.

  • Pingback: jessenoller.com - Benchmark followup: Google-code Edition

What's this?

You are currently reading Have GIL: Want Benchmarks. at jessenoller.com.

meta