why is my app not faster with multiprocessing?!

January 16th, 2009 § 2 comments

Suf­fer­ing from insom­nia this morn­ing, I decided to delve into my python-list archive box in gmail. I nor­mally only scan it once or twice a month due to sig­nal to noise ratio. A post by James Mills here caught my eye:

I’ve noticed over the past few weeks lots of ques­tions
asked about multi-processing (includ­ing myself).

For those of you new to multi-processing, per­haps this
thread may help you. Some things I want to start off
with to point out are:

mul­ti­pro­cess­ing will not always help you get things done faster.”

be aware of I/O bound appli­ca­tions vs. CPU bound”

mul­ti­ple CPUs (cores) can com­pute mul­ti­ple con­cur­rent expres­sions -
not read 2 files concurrently”

in some cases, you may be after dis­trib­uted pro­cess­ing rather than
multi or par­al­lel processing”

cheers
James


James is very correct:

James is quite cor­rect, and maybe I need to amend the mul­ti­pro­cess­ing
doc­u­men­ta­tion to reflect this fact.

While dis­trib­uted pro­gram­ming and par­al­lel pro­gram­ming may cross paths
in a lot of problems/applications, you have to know when to use one
ver­sus the other. Mul­ti­pro­cess­ing only pro­vides some basic prim­i­tives
to help you get started with dis­trib­uted pro­gram­ming, it is not it’s
pri­mary focus, nor is it a com­plete solu­tion for dis­trib­uted
applications.

That being said, there is no rea­son why you could not use it in
con­junc­tion with some­thing like Kamaelia, pyro, $ipc mechanism/etc.

Ulti­mately, it’s a tool in your tool­box, and you have to judge and
exper­i­ment to see which tool is best applied to your prob­lem. In my
own work/code, I use both processes *and* threads — one works bet­ter
than the other depend­ing on the problem.

For exam­ple, a web test­ing tool. This is some­thing that needs to
gen­er­ate hun­dreds of thou­sands of HTTP requests — not a prob­lem you
want to use mul­ti­pro­cess­ing for given that A> It’s pri­mar­ily I/O bound
and B> You can gen­er­ate that many threads on a sin­gle machine.
How­ever, if I wanted to say, gen­er­ate hun­dreds of threads across
mul­ti­ple machines, I would (and do) use mul­ti­pro­cess­ing + paramiko to
con­struct a grid of machines and coor­di­nate work.

That all being said: mul­ti­pro­cess­ing isn’t set in stone — there’s room
for improve­ment in the docs, tests and code, and all patches are
welcome.

–jesse

Like any tool, library — or even lan­guage — you have to know when to switch one tool for another. For exam­ple — it doesn’t make sense for any­one to use python 100% of the time, maybe you have some math rou­tine that sim­ply makes more sense writ­ten in C (say, a crypto func­tion). Heck, even Java is bet­ter suited for some tasks (like mak­ing really long lines in source files!).

Yeah, I wrote PEP 371: but even I am not blind to the use­ful­ness of things like Actors, Threads, Corou­tines, Stack­less Python, etc. There is no sin­gle solu­tion to any­thing, the most we can ever hope for is to have a rich tool­box from which to pick the proper tools.

  • http://Thirdpipe.com JohnMc

    Hav­ing faced sim­i­lar sit­u­a­tions only after rea­son­able head scratch­ing a con­sum­ing a 6 pack of beer with­out find­ing a clear solu­tion I typ­i­cally turn to vir­tu­al­iza­tion. I am for­tu­nate enough to have an 8core box avail­able to me from time to time. Hav­ing donated 4 addi­tional NIC cards I can carve up over a dozen guest instances and get the through­put I need to do stress test­ing of server apps.

    Of course all I have done is trade IO bound­ing for CPU bound­ing and con­cerns about FSB through­put. But in the end I get what I need out of the box with­out resort­ing to even more hard­ware to pur­chase. Now if I could just get my hands on a OC3 test frame.…

  • http://Thirdpipe.com JohnMc

    Hav­ing faced sim­i­lar sit­u­a­tions only after rea­son­able head scratch­ing a con­sum­ing a 6 pack of beer with­out find­ing a clear solu­tion I typ­i­cally turn to vir­tu­al­iza­tion. I am for­tu­nate enough to have an 8core box avail­able to me from time to time. Hav­ing donated 4 addi­tional NIC cards I can carve up over a dozen guest instances and get the through­put I need to do stress test­ing of server apps.

    Of course all I have done is trade IO bound­ing for CPU bound­ing and con­cerns about FSB through­put. But in the end I get what I need out of the box with­out resort­ing to even more hard­ware to pur­chase. Now if I could just get my hands on a OC3 test frame.…

What's this?

You are currently reading why is my app not faster with multiprocessing?! at jessenoller.com.

meta