why is my app not faster with multiprocessing?!

by jesse in ,


Suffering from insomnia this morning, I decided to delve into my python-list archive box in gmail. I normally only scan it once or twice a month due to signal to noise ratio. A post by James Mills here caught my eye:

I've noticed over the past few weeks lots of questions asked about multi-processing (including myself).

For those of you new to multi-processing, perhaps this thread may help you. Some things I want to start off with to point out are:

"multiprocessing will not always help you get things done faster."

"be aware of I/O bound applications vs. CPU bound"

"multiple CPUs (cores) can compute multiple concurrent expressions - not read 2 files concurrently"

"in some cases, you may be after distributed processing rather than multi or parallel processing"

cheers James

James is very correct:

James is quite correct, and maybe I need to amend the multiprocessing documentation to reflect this fact.

While distributed programming and parallel programming may cross paths in a lot of problems/applications, you have to know when to use one versus the other. Multiprocessing only provides some basic primitives to help you get started with distributed programming, it is not it's primary focus, nor is it a complete solution for distributed applications.

That being said, there is no reason why you could not use it in conjunction with something like Kamaelia, pyro, $ipc mechanism/etc.

Ultimately, it's a tool in your toolbox, and you have to judge and experiment to see which tool is best applied to your problem. In my own work/code, I use both processes *and* threads - one works better than the other depending on the problem.

For example, a web testing tool. This is something that needs to generate hundreds of thousands of HTTP requests - not a problem you want to use multiprocessing for given that A> It's primarily I/O bound and B> You can generate that many threads on a single machine. However, if I wanted to say, generate hundreds of threads across multiple machines, I would (and do) use multiprocessing + paramiko to construct a grid of machines and coordinate work.

That all being said: multiprocessing isn't set in stone - there's room for improvement in the docs, tests and code, and all patches are welcome.

-jesse

Like any tool, library - or even language - you have to know when to switch one tool for another. For example - it doesn't make sense for anyone to use python 100% of the time, maybe you have some math routine that simply makes more sense written in C (say, a crypto function). Heck, even Java is better suited for some tasks (like making really long lines in source files!).

Yeah, I wrote PEP 371: but even I am not blind to the usefulness of things like Actors, Threads, Coroutines, Stackless Python, etc. There is no single solution to anything, the most we can ever hope for is to have a rich toolbox from which to pick the proper tools.