Log parsing, Erlang, Python and the processing module

by jesse in ,


So, yesterday, Fredrik Lundh posted an article where he took a million-line log parser written in Erlang and went through the process of building an optimized version in python. He did it with a single-threaded instance, multi-threaded and finally, he did it with the fork/exec process model. This experiment is a great read (especially as I might need to do some large-log processing soon) if just to see the iterations and the benchmark numbers.

Today, Doug Hellmann forwarded me this post - in which the author swaps out the threading module from the threaded parser example with the processing module - he too get very favorable results.

Working with the processing module off and on for the past month (around the benchmarks and some other projects) has led me to really enjoy it, and the article Doug did for Python Magazine did a great job covering it as well.

The processing module is really a great example of a module - the fact that it's API compatible with the threading module makes it a quick and easy drop-in for the threading module in many cases.

There has been some talk about making a "concurrency" PEP or something along those lines to explore getting a library into the stdlib in the python3k or earlier timeline. I've toyed with the idea of starting one for the processing module - given it "breaks" the "GIL limitation" and is simple enough of a drop in for many applications - and it can use multiple machines via the SyncManager.

It's something to think about - I don't know if the author is interested in it (no, I haven't asked - yet). I think I just might drop him an email today.