| Subcribe via RSS

New version of nose-testconfig uploaded.

October 8th, 2008 | Comments | Posted in Programming, Python

I fixed a few knits Kumar pointed out, removed some overly aggressive eval'ing of ini file values, and fixed the damned rst docs.

http://pypi.python.org/pypi/nose-testconfig/0.4

I think 4 people in the world are using this - all 4 of them work with me, hooray!

Threads can’t be serialized?!

October 8th, 2008 | Comments | Posted in Programming, Python

Note the subject is tongue-in-cheek - I realize you can't pickle threading.Thread and multiprocessing.Process, but for some (possibly sick) reason I want to do an implementation for both that implements __getstate__ and __setstate__ and changed the __repr__ for both to not refer to any sort of run-time state.

You don't need to serialize the function thats passed in: the assumption here is that you're serializing them prior to calling start() (because serializing a running thread would be awesome - in that "hey my brain just melted" sort of way).

The best (read, only sane) use case would be to allow threads/processes to be defined in say, a YAML file and generated at parse-time (the client could call .start()) or sending a blob of serialized threads to a remote client and having it call start() to act as a slave.

And that concludes the random thought for the day.

Edit: And in the comments, Ben Hayden posted a link to a version of threading.Thread which can be serialized. You can see it here. I need to make a multiprocessing.Process version of it too, just, well, cause.

Python 2.6 is released, some highlights

October 2nd, 2008 | Comments | Posted in Programming, Python

futurama_bender_sm.jpgWell, it's final boys and girls. Python 2.6 is final and up on the site.

A whole lot of people put a ton of work into this release - Python 3.0rc2 will be out shortly, but the dev team made the decision to focus on 2.6 so we could all get it buttoned up and out in the wild.

You should read "What's new in Python" - it highlight what changed, and the sheer amount of work that went into this. Barry Warsaw, Benjamin Peterson and many, many others deserve a lot of credit for helping shepherd this release to finality.

I'm very proud of this release - for a variety of reasons. This is the first python release that I personally had a chance to directly contribute to, however minor that contribution was. Also, this is the first time I've been able - through an allowance of personal, and work-time - to seriously contribute to an open source project I care about.

Some highlights? Context Managers, the Per-user site-packages directory, abstract base classes, class decorators, and a ton of bug fixes.

The inclusion of the multiprocessing library was a bit of a wild ride for me - I had to learn a lot, very quickly. I didn't get to add/fix all the things I wanted to, but that's the way things go.

I like to think that the inclusion of this package will help people solve problems - that's what makes me most excited about it. It's not perfect, and it's not a solution for everyone, but it's darned useful, and I hope people get to use it, a lot.

I just want to point out something about the package - and concurrency in general - Multiprocessing is a very specific implementation of a threading-like API for side-stepping the GIL, it is not the "final" word in concurrency within the stdlib - it is one stepping stone in a path that will hopefully improve the language as a whole.

I can easily see the creation of a concurrent.* package for python, which might include higher level abstractions for pools/message passing/monitors(actors)/etc for both this package, and the threading package. This is going to be a long, interesting project - and I can only hope that we can really improve the language and standard library as a whole. Again - it's all about solving problems.

That all being said - here are some tidbits of what changed within the multiprocessing package:

  • As you can see from the docs, unfortunately, this package does not work on FreeBSD/OpenBSD - it was an unfortunate casualty late in the game.
  • The bouncyCase names have been dropped in favor of PEP8 compliance within the API. The threading module is also dropping the bouncy names to have fully PEP8 compliant names. It was decided that since MP was new, we would cut to the chase and simply fix it's API. In 2.6, the PEP8 compliant names have been added to the threading module, and the old API will be eventually deprecated.
  • While changing the API to remove the bouncy names, we also cut certain things over to be python properties in order to be more pythonic. These include "process.name", "Process.daemon", "Process.pid", "Process.exitcode", "Process.authkey".
  • There are still open bugs against the package - unfortunately, like the other people on python-dev, this is a part-part time job for me, and I couldn't get to everything. Here's a query to see which MP bugs are assigned to me, and their status (this includes closed bugs).
  • The documentation includes some incredibly helpful examples - including the last example, which uses the package and SSH to create a distributed network of workers. I'll be building on this/cleaning it up and switching it over to use paramiko as part of my continuing work.

That's just some of the work/thoughts that are pinging around in my head. I have a long wish list of things to add, including some message-passing examples/etc. Already, people are building recipes/extening it - for example, David Decotigny has uploaded a recipe to add the same mp.Pool semantics to the threading module as well as one to allow remote method calls and an additional mp.Pool variant here.

Feel free to file bugs - I welcome them - I just may not be able to fix them really fast. Including patches is awesome.

As I said in the opener - a lot of people put a lot of effort into this release (and all the previous ones) so a big thank-you goes out to everyone. I encourage you to download it, try it out, break it but most of all use it.

What do you want to see in a concurrency talk…

September 22nd, 2008 | Comments | Posted in Programming, Python

So, for starters, I'll be doing a 1 hour talk at the PyWorks conference in Atlanta on November 13.

Following that, I am working on a PyCon tutorial, and one other non-tutorial talk for PyCon. My current theme for this round of talks is "Python 2.6, threading and multiprocessing concurrency".

That's a mouthful.

The PyWorks talk is entitled "Getting Started with Concurrency with MultiProcessing and Threads" - given it's a one hour talk, I'm probably going to need to only go into threads on a cursory level (to give everyone a common understanding) and then delve into multiprocessing features and touch on a basic application as an anchor.

I plan on covering the differences, pluses, minuses and "getting started" and walking through some amount of the API. Given it's only an hour, I won't be able to deep-dive into building a fully fledged application. I can only hope to give people enough information to get started.

Moving onto pycon, I wanted to expand on this space for a full-blown tutorial - I wanted to cover threads, multiprocessing, pros, cons, organization, APIs, best practices, the GIL and walk through building actual application(s) - although I need to pick a good "showcase" application that people can grok.

Also, in the tutorial, I was thinking about going from local concurrency to network-based concurrency (starting with the built in multiprocessing API) so that people can understand the differences and take those into account.

The final one - the short talk for PyCon is going to be a "Python 2.6 threading changes and intro the mulitprocessing"

In the final talk, given it's short - it's going to be an overview of the changes/new module and a walk through (I hope) of a basic application.

My question to you, oh interwebs, is what might you like to see/be able to get from talks in these veins? I can't go into all of the whys and hows and whens, but I can arm people with enough information to get up and running.

I desperately want to get the "most bang for the buck" in these talks, obviously, due to the nature of slides, except for the tutorial, I won't be able to show hundreds of lines of code to illustrate everything.

For an example application - I need to make something which will "scale up" - from thread-based approaches all the way to distributed-over-the-network approaches.

The cost of (not) testing software

September 17th, 2008 | Comments | Posted in Programming, Testing

As a long-time automation-engineer/test-focused guy I've pondered the great existential question of "how much testing" is enough for awhile.

More recently, I've started focusing on the cost of not testing a product.

Take for example, Figure 1:

initial_flow.png

Let's take a second for terminology:

  • (A) Unit tests: These are tests focused on developer and maintainer productivity. These are "close to the code" tests that run in mostly simulated environments. Unit tests are a cornerstone of Agile methodology - generally speaking, you make these before your code.
  • (B) Smoke/Simulation: These are the "next layer up" - they use partial systems (e.g. your code + the guy's next to you module) to run more integration-style testing. Smokes are normally run on every compilation of the product along with unit tests. They do not require a fully deployed, functioning system - only a small group of parts.
  • (C) Acceptance/Functional/Regression:
    • Acceptance Test: These normally comprise a large number of your tests
      in an organization. Acceptance tests prove that the specific
      component/feature is sane in the context of the fully deployed product
      - you might require these to be fully developed, executed and passing
      before a specific component or feature is merged to trunk. Acceptance
      tests prove that the feature/component works as intended (not
      programmed). They should be short in execution time.

    • Functional Tests: Functional tests are "larger" and should test as
      much of the functionality of the feature/component as possible, they
      should also test with an eye towards other parts of the product and
      system (e.g. integration). Functional tests should be as expansive and
      detailed as possible. These can also be called Regression tests.
  • (D) Stress/Scalability Tests: This should be self-evident. Stress tests
    build on functional areas to push the product to it's limits - how
    many files can it hold, how many connections can it withstand, etc.

  • (E) Performance Tests: Characterization of key performance stats:
    Objects/second records parsed/sec, and so on.

Now, I want to point out: These definitions are part-agile and part-continuous integration. They don't wholly mesh with terminology used your workplace, or agile. I also know definitions are a holy war, but the definitions are secondary to what I want to talk about. I also excluded specifically calling out exploratory testing.

What the hell *am* I talking about?

If you look at figure A, You'll note I put "Test" (test engineering) off to the side to represent their particular ownership in this model. Unit Tests (and by most measures, smoke and simulation tests) are under the ownership of the core developers.

The other test areas are the ownership of test engineering - obviously they would not exclude Dev from helping though (after all, they win as a team, and fail as a team) but Test is focused on verification that the product is as tested-as-possible before it gets into stage F - the hands of the user.

Ok, this is all fine and good - but hear me out.

This diagram is about cost - for each layer the code/feature passes through emanating from the developer, the cost to the team, and the difficulty in identification and resolution climbs.

This is why Developers write a lot of unit tests and check them in so they run with every check in. Right? You're doing that, right?! The cost for a developer to find a bug with a unit test, and the cost to fix that bug introduced through new code/refactoring/etc, is essentially 1.

Here's a new diagram with some straw-man costs:

cost_flow.png

Essentially, it is in your best interest, as a developer, as a team, to encourage lots and lots of tests lower in the stacks shown here. It starts with comprehensive, checked in unit tests. It continues with having a strong, repeatable testing discipline (for which I recommend test automation).

Why? Because - as you move higher in the stack, that damned bug someone checked in is hidden behind layer upon layer of code. The further from the unit level a bug gets, the more components and environment variables get involved. The more of these that get involved, the harder it is to identify and fix, and the higher the cost.

Now, your bug (our bug) has not only wasted your time, it's holding up a release, test engineers time (albeit - this is our job) is wasted. The higher in the stack a bug gets - the higher the cost in wasted man, release and test hours.

For example - your typo in some messaging code manages to sneak its way through to the (E) Performance level. Let's say your performance tests take, oh, a week to run to completion. For some reason, this sneaky beast only pops up when your system's clocks resync after 6 days of runtime.

So, 6 days into a 7 day test - ding fries are done - the entire system poops itself. You now have to triage the crash, you have to fix it after you identify it (which is probably going to be hard - given it's a performance test, you shut off non essential logging) and then you need to re run the test.

You lost 6 days. More than likely, those are 6 days of lost time you didn't allocate for when you promised the fruits of this iteration/release to those wealthy swedish bankers, eh?

God help you if your bug gets to level (F). This is called the "aversion level" because after a few of these sneak out, and the CEO of the company starts getting phone calls at 4am from those swedish bankers - you're either going to get a stern talking to, or some time in "the box" (all CEOs have a punishment box).

Your goal is to avert bugs from reaching Level F. F stands for F'ed in the literal sense.

My point isn't just about cost. Given this tiered approach, and the need to find as many bugs as possible, you're going to end up having some amount of code duplication between the higher levels of testing and the unit/smoke level - after all, most of the tests above that level are external-system level tests.

Some code - or logic - duplication on a higher level isn't always bad, given the context of where the code is running. Not to mention, frequently, the code within the product may not be in the same language as the code that's automating the tests. Duplication of unit test logic on a system-test-level is always going to happen.

Yes, you can and should reuse code as much as possible, but you can also do this through grey-box testing approaches (e.g. exposing APIs into system internals you would not normally have access to).

Also - this means you have to give your teams time to test. You need to give them ample time to automate what is reasonable, and you need to be willing to not ship a component or feature that simply isn't ready. Much less one that hasn't been tested.

The last thing you want is to have a bug - no matter what it is - hit level F. You, our job on a software engineering team is to put out the absolute best product possible - and you can't do that without filling in all of the magical testing boxes. You need to understand that for every step away from the code you get, the higher the cost.

Letting preventable bugs get in the hands of users is not avoidable - but the risk can be mitigated, and many bugs that do end up in the hands of users are avoidable. The more (and sooner) you test, the lest wealth you expend, and the happier you will be. And the more profits you will reap. We like money.

Welcome to TestButler, a rudimentary test case management app.

September 12th, 2008 | Comments | Posted in Programming, Python, Testing

... Or, learn to laugh at my total inability to do web design, and lack of django-fu

So, following up (albeit slowly) on my "Decent test case tracking/registration" post, I've actually managed to cobble together a google code project, and a rudimentary django application.

Right now, it's in sub-prototype stages. I've done a semi-production deployment internally to get feedback/usage information and suggestions. All the code is checked in and now I need to begin cleaning things up from my rather random "pooping of code".

Not only am I learning Django while I am doing this - I'm catching up on 6+ years of changes in the web development community. The last time I was involved in any sort of web-work was when I worked for Allaire/Macromedia - and even then that was primarily on the back end to ColdFusion, not end-user interfaces.

Writing user-interfaces above a command-line utility is not exactly my strong suit. But hell, Django made it wicked easy to start hacking things together. I had the rough-backend done in less than 2 hours, which let me spend the next few days pondering schemas, mucking with many-to-many fields and other django plugins.

If you go an look at the the google code site, you'll see I've started fleshing out the bits needed to outline the path of the project, and the general reasoning behind it.

Not only do I want feedback - I want to let anyone who wants to join, to join. Contribute ideas, tell me I'm doing it wrong. I already know my django code is messy (I'm working on it) - but most of all I want to help build something useful for the testing community, so if something doesn't mesh, I want to know.

Now, I just need to read my copy of James Bennetts "Practical Django Projects" book. And make a vector-image of a cartoony roomba, or find a better image of a robot butler.

A Peer to Peer test distribution system (TestBot)?

September 8th, 2008 | Comments | Posted in Programming, Python, Testing

Peer-to-Peer systems aren't something new. Things like Bittorrent, AllMyData Tahoe, and others have been using it for file storage for some time.

Still others use the distributed-worker methodologies to do work parceling - they register with the system, and the system hands out chunks of work without factoring in client speed/etc (e.g. distributed.net).

What if you combined the two - you used something like Bittorrent which does peer-selection and allocation intelligently, with a large distributed architecture to manage large scale test execution?

Let's think about a common problem with test engineering. Start with a simple version - you're designing a load test app, this app needs to generate large amounts of load against a target system.

In a normal test environment in a lab - this is "easy" - you simply make sure you have a lab with a bunch of clients, all on the same LAN and you run a test client from all of them that generate load against the system under test.

Now, let's complicate the problem: You don't have enough "same same" test clients. You may have some "close enough" but dang - they're not on the same subnet, or you don't know about them. Not having enough clients in a lab is more common than you'd think.

So how do you make a test that can take advantage of those test clients, factor in their "differences" and still make a relevant test?

Next problem. You have an application you want to run a battery of tests against. You don't have a dedicated client, but you have the possibility of "borrowing time" from some idle machines to run those tests.

The "idle machines" all have different ram, CPU and are varying distances from the system under test on the network. You need to 1> Find them, 2> Figure out which of the available test clients is the most desirable 3> Be able to figure out the main differences between the clients to factor them into results.

You simply want the more capable clients to get more of the "important" tests, and the less capable ones to run the lesser tests. Just to add to it, you want them to possibly be capable of being slaved to a given test to help it along (i.e. a performance or generalized load generation test).

Getting back to the original thought about peer-to-peer systems, I started considering the possibility of applying the peer to peer paradigm/weighted selection to test distribution.

You have a series of clients who volunteer to participate in the swarm. The client responsible for submitting the job (a test) to the swarm would use a Weighted Voting algorithm to rank, sort and choose the "most desirable" clients to distribute a test to.

Each client would respond to a submitted request with various attributes (weights) based on OS Type, number of hops from the client submitting the job and the system-under-test, amount of ram, network speed and so on.

In the case of performance based tests, you would be able to factor these attributes into the results of the test (e.g. latency) - in other tests, you only need to gather the results.

Of course, the concept of a "use idle machines to do something" isn't exactly new - things like distributed.net, seti@home and others do this all the time as I mentioned before.

Then you have things like buildbot - buildbot uses a dedicated (or partially dedicated) pool of machines to compile a target and execute the local unit tests against the compiled thing.

Why not make the two go hand in hand and make an intelligent weighted selection for test distribution? Let's go back to the localized example. You have a continuous build system which compiles and run units. It then looks at a pool of test-peers who have volunteered to be part of the test-swarm and fires off the functional/regression tests (as needed, it can locally deploy or remotely deploy to a test-server).

The buildbot reports the steps as compile: pass, units: pass, and then regression: pending - the buildbot passes out the various tests to the swarm which can be executed asynchronously until all tests are completed (or error'd at which point they're passed back to another client in the swarm).

The nice thing is that this works on both a local LAN, and a globally distributed series of test swarm participants. All you do is weight in favor of the closer clients. (oh, and your application has to be available on the network).

Over time, peers participating in the swarm can be "pushed out" - meaning they have error'd out too many times, have been caught "lying" and so on. The swarm can adapt - clients can come and go as long as a given passed out suite eventually completes. If a client fails/drops, the test is simple re-passed out.

On a localized (meaning, internal-to-your-company) level, this means you can make any client on your network a peer on the system, and the weight-based selection system still applies and you can use any type of system on your LAN - desktops, servers, highly intelligent coffee makers - anything with a network drop.

Additionally, you could point test slaves at a cluster of installed system-under-tests - individual nodes in a web farm, or your application installed on various web hosts. Or a larger system installed in various data centers. This removes the bottleneck of a singular system being tested at once (but requires a lot of intelligence on the managerial level).

It's an idea. Something of a disconnected series of thoughts - maybe it's silly. I like the idea of being able to intelligently leverage a series of test peers distributed anywhere and everywhere. Having a peer-to-peer testing system would be neat-o.

It's a zombie army used for testing -Anon :)

edit: Yes, a loosely coupled, highly distributed load test could be construed as a DDoS... But that's semantics, right?

References/Interesting Reading:

Benjamin Peterson: Testing the CPython Core

September 4th, 2008 | Comments | Posted in Programming, Python

See this "Google Open Source Blog"-post about "Testing the CPython Core".

If you don't know - Ben has been helping us deliver the python 2.6 and 3.0 betas (and RCs) all summer long. He's personally helped me with a lot of stuff around the multiprocessing package and the like. He has really contributed to the releases directly and helped out a lot.

So, thanks Ben!

Stirred up dem bees: Should BSDDB be removed from Python?

September 4th, 2008 | Comments | Posted in Programming, Python

This week, we've seen a push dev-wise to get RC1 completed and ready to go - I've spent some time giving multiprocessing some love (still not done) and a lot of other people have been working around the clock to close out the large number of release blockers.

As of last night though, the trigger was pulled on removing bsddb (the berkley DB python module) from the standard library in the 3.0 timeline (2.6 adds deprecation warnings).

Now, before anyone thinks this is an arbitrary decision, here's the argument (in a nutshell):

  • bsddb has always been painful to maintain
  • Jesus Cea is the only person who has stepped up to maintain it
  • bsddb is "heavy weight" - out most of the standard library, it has the most dependencies and nuances to cross platform maintenance.
  • Until Jesus Cea stepped up later in the 2.6/3.0 process it was "one of those packages" that no one wanted to maintain.
  • For most of 2.6 and 3.0 it's been a buildbot fail train.

See PEP 3108:

Maintenance Burden

Over the years, certain modules have become a heavy burden upon python-dev to maintain. In situations like this, it is better for the module to be given to the community to maintain to free python-dev to focus more on language support and other modules in the standard library that do not take up a undue amount of time and effort.

bsddb3

  • Externally maintained at http://www.jcea.es/programacion/pybsddb.htm .
  • Consistent testing instability.
  • Berkeley DB follows a different release schedule than Python, leading to the bindings not necessarily being in sync with what is available.

This thread is where the hammer fell.

Now, note that Jesus Cea has done an amazing amount of work updating/upgrading the bsddb support for 2.6 and 3.0 (see his recent announcement here). I feel for him in a lot of respects: He busted his butt to fix, maintain and resolve all open issues with bsddb and the buildbots for the release, but the decision had been made back in July to remove/deprecate the bsddb package (see above).

Now, there is a lot more discussion occurring around the removal:

Edit: I finally got a free moment to do an update - in an email this afternoon on Python 3000, the BDFL (GvR) made the final decision on bsddb - it's out as of py3k:

I am still in favor of removing bsddb from Python 3.0. It depends on a
3rd party library of enormous complexity whose stability cannot always
be taken for granted. Arguments about code ownership, release cycles,
bugbot stability and more all point towards keeping it separate. I
consider it no different in nature than 3rd party UI packages (e.g.
wxPython or PyQt) or relational database bindings (e.g. the MySQL or
PostgreSQL bindings): very useful to a certain class of users, but
outside the scope of the core distribution.

Python 3.0 is a perfect opportunity to say goodbye to bsddb as a
standard library component. For apps that depend on it, it is just a
download away -- deprecating in 3.0 and removal in 3.1 would actually
send the *wrong* message, since it is very much alive! I am grateful
for Jesus to have taken over maintenance, and hope that the package
blossoms in its newfound freedom.

YAML question, and a nose-testconfig thought

August 29th, 2008 | Comments | Posted in Programming, Python, Testing

So, I find myself using more and more YAML lately via the pyyaml package. When I was writing nose-testconfig my "preferred" format was/is YAML.

Now, an interesting thing I've noticed about all of the test configurations I am developing/working with is that they have a lot of "shared" attributes (that change infrequently) and a good number of things which change all the time.

This is the perfect spot for something like a dictionary merge. If you have a test config like this:


application:
capability: 1
url: http://foo
subsystem:
max_users: 20

For each of your configuration files, you might only override something like, max_users. For cases like this, it makes sense to load the template document (the file above) and then perform a dict.merge() after loading the second document (overriding the values in the first load) or something akin to that.

This is where my mental dilemma comes in. I could in theory, add a custom !!tag to the yaml which would take a /path/to/file.yaml and load it first, then load the second document or I could do it within nose-testconfig where you might run:


nosetests . --tc-file=myconfig.yaml --tc-rootconfig=parent.yaml

And then I would jump through the hoops (with a merge probably) within the plugin. The problem with that is that I'm worried about coupling the plugin too closely to yaml.

Now, the plugin already supports overriding multiple values: However, this doesn't scale if you have to override a lot of them.

The most common reason I've found for this so far is adding new parameters and values to the YAML files - not all child configurations need to override/define the new values, instead they could just inherit from the parent.

So, the question is - how do (would) you do this so you:

  • Don't sacrifice clarity/readability
  • Scales
  • Doesn't require the root document to be in the same location or have a hard coded path in the child document
  • Doesn't couple the loader (nose-testconfig) tightly with the file format

Right now, it's copy, paste, edit all configuration files I know about, etc.