| Subcribe via RSS

Bug day/weekend June 21-22

June 19th, 2008 | Comments | Posted in Programming, Python

See the Bug Day page on the wiki - this weekend is a python "bug day" weekend where everyone is encouraged to test, fix, find, and hack away at getting the releases closer to the "finish line".

I will be stealing some time when I can this weekend to also pitch in (and fix the bugs assigned to me, but lord knows how I am going to fix the solaris ones and test them!)

Minor problem with ‘make install/altinstall’ and multiprocessing.

June 19th, 2008 | Comments | Posted in Programming, Python

Looks like I missed a spot when adding in the package, and if you do a 'make install' or make altinstall (which I never do - sorry about that) the package is missing.

See the bug report for more information and a small patch. Makefiles for the win.

As a side note - I do not recommend people do a make install into the the system path on their machines. It's generally poor form to do that with alphas/betas/trunks. If anything, just do a make, and the add a symbolic link in /usr/bin or wherever pointing to the python binary in the source directory.

Of course, I'm so anal about system paths I use virtualenv.py a *lot*. Of course, if I was 'make installing' a lot I would have found this. /grumble

Python 2.6 and 3.0 Beta 1 Released.

June 19th, 2008 | Comments | Posted in Programming, Python

I can finally crawl out from under the rock I've been under to happily pass on the news that Python 2.6 and 3.0 Beta 1 is officially released. For those of you living on the moon - this release is especially exciting for me for a few reasons:

  1. PEP 371, the addition of the pyprocessing module (as the multiprocessing module) has been implemented.
  2. This was my first serious foray into core-development, so I learned a lot (some rather painfully/embarrassingly).
  3. I ended up with commit privileges so I can help maintain the new package/module. My first checkin was unfortunately a patch to disable some tests that came in with the package.
  4. I killed the buildbots and delayed the beta.

Mad props/thanks need to be passed on to a few people - Benjamin Peterson, Adam Olsen, Richard Oudkerk and many others helped get this done and helped me debug various problems that were exposed after the new package went in.

The docs for the package are here (2.6 dev docs). It's in both py3k and py2.6

I'll be working over the next few weeks on cleaning up the tests, the build and the docs for the new package. I welcome suggestions and urge people to file bugs they find in the bug tracker. Of course, if you include a patch - that's even better.

If you're looking for some cooler-stuff with the package: Look at the examples. The final one is an example of how to use the package to spread work amongst a cluster of machines.

You're going to notice something "special" about the package included in python if you have worked with it in the past: The methods have changed names. There was a discussion on Python-Dev in the context of the package's inclusion about whether or not to stick with "strict" threading module API naming, or to take this as a chance to move towards PEP 8-style naming. We chose the latter, and in fact the threading module itself is getting re-worked over time to move to the same PEP-8 style naming (tbd: I still need to pep this up). I think dropping the old Java-Style naming and going with Pythonic naming and accessing of things will be a generally Good Thing.

I've had to reiterate this a few times - but since I'm cheerleading, I might as well do it again: I do not think that the addition of this package is the "silver bullet" for "concurrency in python". I also don't think it will solve 100% of all the problems out there for people. It's useful, and a best of breed: but it is only one step in a larger movement. Adoption of Adam Olsen's work, continuing to refine the multiprocessing package, and thinking about distributed message protocols within the stdlib are just a few of the things we can do.

I'm seriously thinking about doing a introduction talk about the new package/concurrency stuff at pycon 09, although I don't know how many people would be interested.

Doing all of this work also got me even more excited for 2.6 and 3.0 - I really do recommend people download the new builds and really hammer them, it's important the final versions be as bug-free as possible.

Programmer Insecurity and Mea Culpa.

June 13th, 2008 | Comments | Posted in Programming, Python

sorry.jpgBen Sussman-Collins put up an excellent blog post on programmer insecurity - this rings particularly loud with me for a few reasons.

The first reason is that I used to be that guy - never checking anything in until I felt it was "perfect", then I swung to the other extreme - putting things in too quickly/aggressively.

It's a fine balance between the two, but one thing he says is especially pertinent in both open, and closed source worlds:

Be transparent. Share your work constantly. Solicit feedback. Appreciate critiques. Let other people point out your mistakes. You are not your code. Do not be afraid of day-to-day failures — learn from them. (As they say at Google, “don’t run from failure — fail often, fail quickly, and learn.”) Cherish your history, both the successes and mistakes. All of these behaviors are the way to get better at programming. If you don’t follow them, you’re cheating your own personal development.

I've grown to truly appreciate peer-review and discussion, I feel it makes all the parties involved that much better, and ultimately it improves the quality of the code. I've grown to miss peer-reviews when I don't have them - the chance to talk over the design of something and step through the code and debate various design points and possible improvements is very, very valuable.

Failure is not permanent with code: It is a transitional state which can be overcome.

That all being said: I must issue a mea culpa. Earlier this week I put together a patch for the multiprocessing package inclusion into python-core. Note that I've been using this package for some time, on multiple platforms - the tests have not failed me, and I felt that things were A-OK.

Once in though, the buildbots started doing something which reminds me of a particularly rowdy party at LinuxWorld way back - namely, all of them started churning away and promptly began puking on themselves. At least unlike me, they didn't misplace clothing or their hotel.

So, I broke the core.

I'm still chasing down the problems - we're suffering test lock-ups and a few compile errors on certain platforms (debian ppc for the loose). I feel awful because I did drop a code-bomb on Tuesday in my urgency to make the beta on wednesday. Dropping something that big right before a deadline is just poor form, and because of it - the betas didn't ship.

With that said: The work Ben Peterson, Adam Olsen, and many others have done to help (me/core) has been phenomenal and it simply reenforces why working in a community is so valuable for me. Now I just have to fix it. Anyone else want to help?

PEP 371: Addition of pyprocessing (as multiprocessing) accepted!

June 5th, 2008 | Comments | Posted in Programming, Python

Per Guido:

I've accepted your PEP. I think it still needs some clean-up and
perhaps clarification of the agreement reached about API style, but
there is nothing now that keeps you from implementing it! Hopefully
you'll make the beta release early next week.

--Guido

Woop woop woop.

Making re-creatable random data files really fast in python.

May 30th, 2008 | Comments | Posted in Programming, Python

Note, I'm just really happy with this - feel free to correct me or give me enhancements.

So - let's state the problem:

  • I have to create a lot of files of varying sizes
  • I can not store them long-term
  • I must be able to recreate them at any point
  • Creation must be fast for files large, and small

That all being said - I wanted to be able to use a seed made of integers which only make sense to me that embeds certain data relevant to the test within it, so the seed would have both random and not-random integers in it.

I also wanted to avoid using /dev/random and /dev/urandom - both are deceptively fast until you fire it up using a bunch of threads and drain your entropy pool. Not to mention - I want it fast, so I don't want to have an extra read() call. I need the data put in the file to be a "known thing" - i.e: randomly generated from a non-random pool of data (a words file).

Ergo, this:

 
import collections
import os
 
seed = "1092384956781341341234656953214543219"
words = open("lorem.txt", "r").read().replace("\n", '').split()
 
def fdata():
    a = collections.deque(words)
    b = collections.deque(seed)
    while True:
        yield ' '.join(list(a)[0:1024])
        a.rotate(int(b[0]))
        b.rotate(1)
 
g = fdata()
size = 1073741824 # 1gb
fname = "test.out"
fh = open(fname, 'w')
while os.path.getsize(fname) < size:
    fh.write(g.next())
 

lorem.txt is from here - it's just a Lorem Ipsum file. On my machine I can generate a 1 gb file in 28 seconds on disk, the bonus is that I don't need to write the data in the final test - I just need to provide it to the caller.

It's not as optimized as it could be: I could read bigger chunks of data, things like that. If I dropped the os.path.getsize, it might get faster (count the number of chunks from size / 1024) but that limits me to knowing the chunk size of the generator.

But - I meet my criteria - and can generate large amounts of file data fast, in a re-creatable form.

Oh well. Just something cool on a friday before motorcycle class.

“Final” Draft of Processing inclusion PEP …

May 27th, 2008 | Comments | Posted in Programming, Python

... Sent to peps@python.org - now more discussion/debate and stuff shall occur.

For your own edification, here is what I just sent out. Special thanks to all of the early reviewers.

Edit: The PEP is on the official site: http://www.python.org/dev/peps/pep-0371/

Getting Processing into the stdlib

May 15th, 2008 | Comments | Posted in Programming, Python

I shot an email out to Python-Dev earlier this week asking for comments/questions regarding my push to get the Processing into the standard library. There's been some decent discussion about target releases and other meta-issues around getting it in.

Right now, it looks like I am going to try to target 2.7 and 3.1 - this makes sense for a few reasons.

  • First, the PEP deadline was uh, a year ago for 2.6 and 3.0
  • There's some cleanup on the module which needs to be done
  • There might be some renaming requirements
  • Need to talk to R. about a 1.0 release
  • Need to chunk out some time to convert the tests to unit test format.

That all being said - it doesn't look unfeasible to accomplish - and the response both on list and to me privately has been 95% +1 and 5% -.5 and -1 - the positive response really does make me feel that this is the right approach to take.

I am currently working on revised benchmarks for processing vs. threads vs. pp vs. other right now - I'll be publishing those as soon as I complete them to both here and the mailing list discussion as a counterpoint to some of the open questions.

I'd like to see if any of you, oh internet people, have anything else you'd like to have answered for this or anything you'd like to add to the discussion.

Note, I am not trying to solve the "distributed" problem with the inclusion of this - the remote capabilities of the processing module are a side-benefit - not the primary benefit to trying to get this in. I am taking some of the distributed stuff mentally into account - but the goal is to scratch one specific itch - not to solve everyones problem with a single addition.

Now all I have to get over is some bizarre errors with parallel python ramming into ulimit, uh, limits. Luckily I have everything from a dual core to an eight core to hack on!

What are your favorite nose plugins? How do you run Nose?

May 13th, 2008 | Comments | Posted in Programming, Python, Testing

So, I am pondering going all-out with Nose, and I am wondering what plugins people find the most useful for it, and also how people are using it.

I see two aspects of nose/any test execution mechanism: Unit testing "native" (i.e: python code) and running tests that are more functional in nature (i.e: not testing python, but instead testing a web interface).

What are the features of nose you found the most useful?

Too bad I couldn't find a decent nose picking graphic for this one.

Python 2.6a3 and 3.0a5 released

May 9th, 2008 | Comments | Posted in Programming, Python

Barry sent the email out last night that both Python 2.6a3 and 3.0a5 are released - these are the final alphas for both. I'd go and grab em while they're still hot off the presses... Provided you're not already sync'ing from svn/bzr/mercurial/wtf.