Log parsing, Erlang, Python and the processing module

October 7th, 2007 § 0 comments

So, yes­ter­day, Fredrik Lundh posted an arti­cle where he took a million-line log parser writ­ten in Erlang and went through the process of build­ing an opti­mized ver­sion in python.

He did it with a single-threaded instance, multi-threaded and finally, he did it with the fork/exec process model. This exper­i­ment is a great read (espe­cially as I might need to do some large-log pro­cess­ing soon) if just to see the iter­a­tions and the bench­mark numbers.

Today, Doug Hell­mann for­warded me this post — in which the author swaps out the thread­ing mod­ule from the threaded parser exam­ple with the pro­cess­ing mod­ule — he too get very favor­able results.

Work­ing with the pro­cess­ing mod­ule off and on for the past month (around the bench­marks and some other projects) has led me to really enjoy it, and the arti­cle Doug did for Python Mag­a­zine did a great job cov­er­ing it as well.

The pro­cess­ing mod­ule is really a great exam­ple of a mod­ule — the fact that it’s API com­pat­i­ble with the thread­ing mod­ule makes it a quick and easy drop-in for the thread­ing mod­ule in many cases.

There has been some talk about mak­ing a “con­cur­rency” PEP or some­thing along those lines to explore get­ting a library into the stdlib in the python3k or ear­lier time­line. I’ve toyed with the idea of start­ing one for the pro­cess­ing mod­ule — given it “breaks” the “GIL lim­i­ta­tion” and is sim­ple enough of a drop in for many appli­ca­tions — and it can use mul­ti­ple machines via the SyncManager.

It’s some­thing to think about — I don’t know if the author is inter­ested in it (no, I haven’t asked — yet). I think I just might drop him an email today.

What's this?

You are currently reading Log parsing, Erlang, Python and the processing module at jessenoller.com.

meta