Google Testing Blog: “There, but for the grace of testing, go I”

July 17th, 2010 § 0 comments § permalink

The Google Test­ing Blog has a good post up right now by James Whit­taker called “There, but for the grace of test­ing, go I” — it’s a good read, and a per­ti­nent one for any of you/us who feel strongly about quality.

Even though I’ve spent more time then not on “the other side” of the table (Devel­oper, noun — “focus on mak­ing soft­ware (ergo, bugs)”) I find that James’ words ring pretty loudly for me still, espe­cially his part on risk analysis:

I am thank­ful that the vast major­ity of bugs that affect entire user pop­u­la­tions are gen­er­ally nuisance-class issues. These are typ­i­cally bugs con­cern­ing awk­ward UI ele­ments or the occa­sional mis­fir­ing of some fea­ture or another where workarounds and alter­na­tives will suf­fice until a minor update can be made. Seri­ous bugs tend to have a more local­ized effect. True recall class bugs, seri­ous fail­ures that affect large pop­u­la­tions of users, are far less com­mon. Testers can take advan­tage of the fact that not all bugs are equally dam­ag­ing and pri­or­i­tize their effort to find bugs in the order of their seri­ous­ness. The futil­ity of find­ing every bug can be replaced by an inves­ti­ga­tion based on risk.

I’d rec­om­mend James’ post amongst the oth­ers there on that blog — it reminded me of an old rant of mine “The cost of (not) test­ing soft­ware”. Any­one in the busi­ness of mak­ing some­thing is also in the busi­ness of mak­ing bugs. It’s impor­tant for us to keep that in mind when we deal with our day to day job — and when we think about our cus­tomers. It’s also impor­tant for us to keep that in mind when crit­i­ciz­ing or drag­ging any per­son or com­pany or code through the muck.

Generating re-creatable random files…

February 27th, 2009 § 7 comments § permalink

… And the case of obses­sive opti­miza­tion. A lit­tle while ago, I posted a small snip­pet of code that was designed to gen­er­ate data files of a given size, based off a seed very quickly (arti­cle here). The goals of this code is/was the following:

  • Gen­er­ate large amounts of semi-random data quickly
  • Data gen­er­a­tion can not use /dev/urandom or other sys­tem entropy buck­ets. These are to slow, and hav­ing hun­dred of threads pulling from these buck­ets is a bad idea. Oh — and it needs to work on windows.
  • The data must never be sync’ed to disk: when you’re gen­er­at­ing a large data set, on the scale of hun­dreds of mil­lions of files, stor­ing it on disk sucks, and the disk becomes the bottleneck.
  • Cre­ation of the files must be at least 1 gigabit/second — this means a sin­gle thread pass­ing one of these gen­er­a­tors to say, a pycurl han­dle could “in the­ory” hit line speed: the gen­er­a­tor can not be the bottleneck
  • The data in the­ses files must be able to be recre­ated at any time pro­vided you have the seed.
  • Set­ting a seed in python’s ran­dom() has side-effect issues, and can not be used. Besides, lots of ran­dom calls are expensive.
  • I need the abil­ity to swap out the data source, I use a lorem file here, but a dif­fer­ent type will be needed later.
  • The data source should only be parsed once for the import (sin­gle­ton, ho!)
  • The name, and the file data must be unique — they must hash dif­fer­ently (to pre­vent de-dupers from, well, de dup­ing them)

I am revis­it­ing this code as we found out the orig­i­nal ver­sion could only gen­er­ate file data at around 500 megabits/second. This is much too slow for my tastes, as I might as well be read­ing it from disk. We can make it faster.

After clean­ing things up, remov­ing some overly com­plex logic (and sev­eral moments of “what the hell was I think­ing”), I came up with this:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
LOREM = os.path.join(os.path.dirname(__file__), "datafiles", "lorem.txt")
WORDS = open(LOREM, "r").read().split()
 
def chunker(size, seed, chunksize=1000):
    word_q = collections.deque(WORDS)
    seed_q = collections.deque(int(i) for i in str(seed))
    # Rotate the word_q by the seed so that small files are unique.
    word_q.rotate(seed)
    current_size = size
    while current_size > 0:
        data = ' '.join(word_q)
        if chunksize > current_size:
            chunksize = current_size
        chunksize = (yield data[0:chunksize]) or chunksize
        current_size -= chunksize
        word_q.rotate(seed_q[0])
        seed_q.rotate(1)
 
class SyntheticFile(object):
    """ File-Like object backed by the ``chunker`` function. Allows the
    construction of an object which can be passed to something like a pycurl
    handle streaming data to a server """
    def __init__(self, size, seed):
        """ 
        **size**: integer, bytes
        **seed**: integer
        **chunksize**: optional, integer
        """
        self.chunker = None
        self.size = size
        self.seed = seed
 
    def write(self):
        """ unsupported, throw an error if called """
        raise Exception('not supported')
 
    def read(self, readsize):
        """ Support read() - **readsize** is in bytes. """
        if not self.chunker:
            self.chunker = chunker(self.size, self.seed, readsize)
            return self.chunker.next()
        try:
            return self.chunker.send(readsize or 1000)
        except StopIteration:
            pass
        return ""

This ver­sion hit around 618 megabits/second and it used the generator’s send() capa­bil­ity to allow read­ers using the Syn­thet­ic­File imple­men­ta­tion to alter the chunk size they’re read­ing on the fly, which is impor­tant if you have a con­sumer that wants the abil­ity to read small/read big/read small. Well, that’s fine and all, but I was stymied — I wanted to make this thing fly. I want to be able to gen­er­ate this data at at least 1 gigabit/second, if not faster.

Astute read­ers may point out that there’s other ways of doing this — mmap, sim­ply embed­ding the unique seed or a uuid — well, this story isn’t about that, is it?

In any case, I sus­pected the “data = ’ ‘.join(word_q)” line was the cul­prit — deque is pretty opti­mized, and I had removed a mas­sive chunk of code which didn’t make sense, and in fact, cPro­file showed I was right:

woot:synthfiles jesse$ python -m cProfile synthfilegen.py
         2441475 function calls in 203.791 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...snip...
   610352  180.282    0.000  180.282    0.000 {method 'join' of 'str'  objects}
...snip...

180 out of 203 cpu sec­onds, on the join alone. Curses! So this is when I really went men­tal (this is what hap­pens when you’re too close to some­thing). I decided that I needed to find some mag­i­cal way of skip­ping the join and only read­ing what I needed. I ran down that rathole for a bit, until a friend of mine point out “just make the words bigger”.

Full stop. I ini­tially dis­counted it, I was zeroed in on that join — oh wait. The text in the lorem file when split on white­space is 4368 words. Join­ing those back together within the loop is expen­sive — that much I knew. I hit on the idea that if instead of con­sid­er­ing them words, I thought of them as chunks (which is how I was treat­ing them).

I added a method (process_chunks) which treated the data source as chunks of bytes and made the WORDS vari­able a list of those chunks. Ini­tially, I set the chunk size to 100 (bytes) and here’s the cPro­file output:

woot:synthfiles jesse$ python -m cProfile synthfilegen.py
         2441766 function calls in 51.549 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...snip...
   610352   29.712    0.000   29.712    0.000 {method 'join' of 'str' objects}
...snip...

And now the gen­er­a­tor is kick­ing data out at 2.34 giga­bits. Huge suc­cess. Obvi­ously, if you increase the chunk size, it speeds up a bit more (e.g. 300 byte chunks is about 2.5 gigabits/second). I cleaned it up a bit and here is the code:

(thanks! bitbucket.org).
Note that the speeds I’m dis­cussing are pass­ing the Syn­th­FileOb­ject to a pycurl han­dle and stream­ing it across the wire: not to disk.

All told, it was a fun lit­tle jaunt, and I’ve suc­ceeded to make some­thing which I con­sid­ered “throw­away” into some­thing that’s a lot more use­ful, clean and fast. I’ve added a hand­ful of unit tests to my sand­box, and I might make this a real mod­ule if any­one wants it. I want to rework the _process_chunk/globals stuff, but I farted around with this long enough for now. I also want to add the abil­ity to remove the chunk­ing alto­gether and sim­ply insert the seed into the data response, and not mess with the lorem text.

edit: I just checked in a new ver­sion which removes the _process_chunks func­tion and other glob­als and moves them into a class. I hate globals.

Nose-testconfig version .5 uploaded.

October 23rd, 2008 § 0 comments § permalink

Fixes a minor issue with python con­fig file parsing.

Next up, hier­ar­chi­cal YAML files!

TestButler update (updated)

October 10th, 2008 § 4 comments § permalink

With the much-appreciated help of Bran­don Barry (with whom I just hap­pen to work) — there’s been a needed update to the test­but­ler code base I couldn’t get to — some highlights:

  • Cleaned up the CSS, moved to blue­print for the larger por­tion of the CSS and the start of jQuery usage for the javascript portions
  • Tem­plates have been cleaned up/gotten a major facelift
  • site-media has been cleaned up
  • Deleted unused code I had in the prototype
  • Mod­els cleaned up

Over­all, it’s look­ing much bet­ter. Yes, it’s in a production-use now, and I love mark­down syn­tax. You can see some screen shots here.

We ditched the roomba-picture, I need to find some­one handy with art­work to maybe make some cus­tom icons/pics for us (I really want a cartoony-robot-butler)

There’s a lot I’d like to do, obvi­ously — but first I have to get started on the results track­ers and the cor­re­spond­ing nose plu­gin to feed the results to the sys­tem. I fig­ure I am going to use the nose-xunit plu­gin and some cus­tom XSLT for now.

One thing I need to fig­ure out is if the django-markdown plu­gin allows for rel­a­tive %url% links within a block of text so we can cross-link test­cases, I may write a cus­tom tem­plate tag.

Edit: Addi­tion­ally, I just com­mit­ted a change to remove all notion of “com­po­nent” from the sys­tem. We decided that a test case could have any num­ber of com­po­nents, or none at all, and that it was more log­i­cally con­sis­tent to track those as tags-in-the-cloud. For exam­ple, a given test case might be tagged “gui, regres­sion, smoketest, per­for­mance” or “smoke, stor­age, gui” etc. Being more flex­i­ble with sort­ing and orga­ni­za­tion was our goal.

The cost of (not) testing software

September 17th, 2008 § 12 comments § permalink

As a long-time automation-engineer/test-focused guy I’ve pon­dered the great exis­ten­tial ques­tion of “how much test­ing” is enough for awhile.

More recently, I’ve started focus­ing on the cost of not test­ing a product.

Take for exam­ple, Fig­ure 1:

initial_flow.png

Let’s take a sec­ond for terminology:

  • (A) Unit tests: These are tests focused on devel­oper and main­tainer pro­duc­tiv­ity. These are “close to the code” tests that run in mostly sim­u­lated envi­ron­ments. Unit tests are a cor­ner­stone of Agile method­ol­ogy — gen­er­ally speak­ing, you make these before your code.
  • (B) Smoke/Simulation: These are the “next layer up” — they use par­tial sys­tems (e.g. your code + the guy’s next to you mod­ule) to run more integration-style test­ing. Smokes are nor­mally run on every com­pi­la­tion of the prod­uct along with unit tests. They do not require a fully deployed, func­tion­ing sys­tem — only a small group of parts.
  • © Acceptance/Functional/Regression:
    • Accep­tance Test: These nor­mally com­prise a large num­ber of your tests
      in an orga­ni­za­tion. Accep­tance tests prove that the spe­cific
      component/feature is sane in the con­text of the fully deployed prod­uct
      – you might require these to be fully devel­oped, exe­cuted and pass­ing
      before a spe­cific com­po­nent or fea­ture is merged to trunk. Accep­tance
      tests prove that the feature/component works as intended (not
      pro­grammed). They should be short in exe­cu­tion time.

    • Func­tional Tests: Func­tional tests are “larger” and should test as
      much of the func­tion­al­ity of the feature/component as pos­si­ble, they
      should also test with an eye towards other parts of the prod­uct and
      sys­tem (e.g. inte­gra­tion). Func­tional tests should be as expan­sive and
      detailed as pos­si­ble. These can also be called Regres­sion tests.
  • (D) Stress/Scalability Tests: This should be self-evident. Stress tests
    build on func­tional areas to push the prod­uct to it’s lim­its — how
    many files can it hold, how many con­nec­tions can it with­stand, etc.

  • (E) Per­for­mance Tests: Char­ac­ter­i­za­tion of key per­for­mance stats:
    Objects/second records parsed/sec, and so on.

Now, I want to point out: These def­i­n­i­tions are part-agile and part-continuous inte­gra­tion. They don’t wholly mesh with ter­mi­nol­ogy used your work­place, or agile. I also know def­i­n­i­tions are a holy war, but the def­i­n­i­tions are sec­ondary to what I want to talk about. I also excluded specif­i­cally call­ing out exploratory testing.

What the hell *am* I talk­ing about?

If you look at fig­ure A, You’ll note I put “Test” (test engi­neer­ing) off to the side to rep­re­sent their par­tic­u­lar own­er­ship in this model. Unit Tests (and by most mea­sures, smoke and sim­u­la­tion tests) are under the own­er­ship of the core developers.

The other test areas are the own­er­ship of test engi­neer­ing — obvi­ously they would not exclude Dev from help­ing though (after all, they win as a team, and fail as a team) but Test is focused on ver­i­fi­ca­tion that the prod­uct is as tested-as-possible before it gets into stage F — the hands of the user.

Ok, this is all fine and good — but hear me out.

This dia­gram is about cost — for each layer the code/feature passes through ema­nat­ing from the devel­oper, the cost to the team, and the dif­fi­culty in iden­ti­fi­ca­tion and res­o­lu­tion climbs.

This is why Devel­op­ers write a lot of unit tests and check them in so they run with every check in. Right? You’re doing that, right?! The cost for a devel­oper to find a bug with a unit test, and the cost to fix that bug intro­duced through new code/refactoring/etc, is essen­tially 1.

Here’s a new dia­gram with some straw-man costs:

cost_flow.png

Essen­tially, it is in your best inter­est, as a devel­oper, as a team, to encour­age lots and lots of tests lower in the stacks shown here. It starts with com­pre­hen­sive, checked in unit tests. It con­tin­ues with hav­ing a strong, repeat­able test­ing dis­ci­pline (for which I rec­om­mend test automation).

Why? Because — as you move higher in the stack, that damned bug some­one checked in is hid­den behind layer upon layer of code. The fur­ther from the unit level a bug gets, the more com­po­nents and envi­ron­ment vari­ables get involved. The more of these that get involved, the harder it is to iden­tify and fix, and the higher the cost.

Now, your bug (our bug) has not only wasted your time, it’s hold­ing up a release, test engi­neers time (albeit — this is our job) is wasted. The higher in the stack a bug gets — the higher the cost in wasted man, release and test hours.

For exam­ple — your typo in some mes­sag­ing code man­ages to sneak its way through to the (E) Per­for­mance level. Let’s say your per­for­mance tests take, oh, a week to run to com­ple­tion. For some rea­son, this sneaky beast only pops up when your system’s clocks resync after 6 days of runtime.

So, 6 days into a 7 day test — ding fries are done — the entire sys­tem poops itself. You now have to triage the crash, you have to fix it after you iden­tify it (which is prob­a­bly going to be hard — given it’s a per­for­mance test, you shut off non essen­tial log­ging) and then you need to re run the test.

You lost 6 days. More than likely, those are 6 days of lost time you didn’t allo­cate for when you promised the fruits of this iteration/release to those wealthy swedish bankers, eh?

God help you if your bug gets to level (F). This is called the “aver­sion level” because after a few of these sneak out, and the CEO of the com­pany starts get­ting phone calls at 4am from those swedish bankers — you’re either going to get a stern talk­ing to, or some time in “the box” (all CEOs have a pun­ish­ment box).

Your goal is to avert bugs from reach­ing Level F. F stands for F’ed in the lit­eral sense.

My point isn’t just about cost. Given this tiered approach, and the need to find as many bugs as pos­si­ble, you’re going to end up hav­ing some amount of code dupli­ca­tion between the higher lev­els of test­ing and the unit/smoke level — after all, most of the tests above that level are external-system level tests.

Some code — or logic — dupli­ca­tion on a higher level isn’t always bad, given the con­text of where the code is run­ning. Not to men­tion, fre­quently, the code within the prod­uct may not be in the same lan­guage as the code that’s automat­ing the tests. Dupli­ca­tion of unit test logic on a system-test-level is always going to happen.

Yes, you can and should reuse code as much as pos­si­ble, but you can also do this through grey-box test­ing approaches (e.g. expos­ing APIs into sys­tem inter­nals you would not nor­mally have access to).

Also — this means you have to give your teams time to test. You need to give them ample time to auto­mate what is rea­son­able, and you need to be will­ing to not ship a com­po­nent or fea­ture that sim­ply isn’t ready. Much less one that hasn’t been tested.

The last thing you want is to have a bug — no mat­ter what it is — hit level F. You, our job on a soft­ware engi­neer­ing team is to put out the absolute best prod­uct pos­si­ble — and you can’t do that with­out fill­ing in all of the mag­i­cal test­ing boxes. You need to under­stand that for every step away from the code you get, the higher the cost.

Let­ting pre­ventable bugs get in the hands of users is not avoid­able — but the risk can be mit­i­gated, and many bugs that do end up in the hands of users are avoid­able. The more (and sooner) you test, the lest wealth you expend, and the hap­pier you will be. And the more prof­its you will reap. We like money.

Welcome to TestButler, a rudimentary test case management app.

September 12th, 2008 § 0 comments § permalink

… Or, learn to laugh at my total inabil­ity to do web design, and lack of django-fu

So, fol­low­ing up (albeit slowly) on my “Decent test case tracking/registration” post, I’ve actu­ally man­aged to cob­ble together a google code project, and a rudi­men­tary django application.

Right now, it’s in sub-prototype stages. I’ve done a semi-production deploy­ment inter­nally to get feedback/usage infor­ma­tion and sug­ges­tions. All the code is checked in and now I need to begin clean­ing things up from my rather ran­dom “poop­ing of code”.

Not only am I learn­ing Django while I am doing this — I’m catch­ing up on 6+ years of changes in the web devel­op­ment com­mu­nity. The last time I was involved in any sort of web-work was when I worked for Allaire/Macromedia — and even then that was pri­mar­ily on the back end to Cold­Fu­sion, not end-user interfaces.

Writ­ing user-interfaces above a command-line util­ity is not exactly my strong suit. But hell, Django made it wicked easy to start hack­ing things together. I had the rough-backend done in less than 2 hours, which let me spend the next few days pon­der­ing schemas, muck­ing with many-to-many fields and other django plugins.

If you go an look at the the google code site, you’ll see I’ve started flesh­ing out the bits needed to out­line the path of the project, and the gen­eral rea­son­ing behind it.

Not only do I want feed­back — I want to let any­one who wants to join, to join. Con­tribute ideas, tell me I’m doing it wrong. I already know my django code is messy (I’m work­ing on it) — but most of all I want to help build some­thing use­ful for the test­ing com­mu­nity, so if some­thing doesn’t mesh, I want to know.

Now, I just need to read my copy of James Ben­netts “Prac­ti­cal Django Projects” book. And make a vector-image of a car­toony roomba, or find a bet­ter image of a robot butler.

A Peer to Peer test distribution system (TestBot)?

September 8th, 2008 § 7 comments § permalink

Peer-to-Peer sys­tems aren’t some­thing new. Things like Bit­tor­rent, AllMy­Data Tahoe, and oth­ers have been using it for file stor­age for some time.

Still oth­ers use the distributed-worker method­olo­gies to do work parcel­ing — they reg­is­ter with the sys­tem, and the sys­tem hands out chunks of work with­out fac­tor­ing in client speed/etc (e.g. distributed.net).

What if you com­bined the two — you used some­thing like Bit­tor­rent which does peer-selection and allo­ca­tion intel­li­gently, with a large dis­trib­uted archi­tec­ture to man­age large scale test execution?

Let’s think about a com­mon prob­lem with test engi­neer­ing. Start with a sim­ple ver­sion — you’re design­ing a load test app, this app needs to gen­er­ate large amounts of load against a tar­get system.

In a nor­mal test envi­ron­ment in a lab — this is “easy” — you sim­ply make sure you have a lab with a bunch of clients, all on the same LAN and you run a test client from all of them that gen­er­ate load against the sys­tem under test.

Now, let’s com­pli­cate the prob­lem: You don’t have enough “same same” test clients. You may have some “close enough” but dang — they’re not on the same sub­net, or you don’t know about them. Not hav­ing enough clients in a lab is more com­mon than you’d think.

So how do you make a test that can take advan­tage of those test clients, fac­tor in their “dif­fer­ences” and still make a rel­e­vant test?

Next prob­lem. You have an appli­ca­tion you want to run a bat­tery of tests against. You don’t have a ded­i­cated client, but you have the pos­si­bil­ity of “bor­row­ing time” from some idle machines to run those tests.

The “idle machines” all have dif­fer­ent ram, CPU and are vary­ing dis­tances from the sys­tem under test on the net­work. You need to 1> Find them, 2> Fig­ure out which of the avail­able test clients is the most desir­able 3> Be able to fig­ure out the main dif­fer­ences between the clients to fac­tor them into results.

You sim­ply want the more capa­ble clients to get more of the “impor­tant” tests, and the less capa­ble ones to run the lesser tests. Just to add to it, you want them to pos­si­bly be capa­ble of being slaved to a given test to help it along (i.e. a per­for­mance or gen­er­al­ized load gen­er­a­tion test).

Get­ting back to the orig­i­nal thought about peer-to-peer sys­tems, I started con­sid­er­ing the pos­si­bil­ity of apply­ing the peer to peer paradigm/weighted selec­tion to test distribution.

You have a series of clients who vol­un­teer to par­tic­i­pate in the swarm. The client respon­si­ble for sub­mit­ting the job (a test) to the swarm would use a Weighted Vot­ing algo­rithm to rank, sort and choose the “most desir­able” clients to dis­trib­ute a test to.

Each client would respond to a sub­mit­ted request with var­i­ous attrib­utes (weights) based on OS Type, num­ber of hops from the client sub­mit­ting the job and the system-under-test, amount of ram, net­work speed and so on.

In the case of per­for­mance based tests, you would be able to fac­tor these attrib­utes into the results of the test (e.g. latency) — in other tests, you only need to gather the results.

Of course, the con­cept of a “use idle machines to do some­thing” isn’t exactly new — things like distributed.net, seti@home and oth­ers do this all the time as I men­tioned before.

Then you have things like build­bot — build­bot uses a ded­i­cated (or par­tially ded­i­cated) pool of machines to com­pile a tar­get and exe­cute the local unit tests against the com­piled thing.

Why not make the two go hand in hand and make an intel­li­gent weighted selec­tion for test dis­tri­b­u­tion? Let’s go back to the local­ized exam­ple. You have a con­tin­u­ous build sys­tem which com­piles and run units. It then looks at a pool of test-peers who have vol­un­teered to be part of the test-swarm and fires off the functional/regression tests (as needed, it can locally deploy or remotely deploy to a test-server).

The build­bot reports the steps as com­pile: pass, units: pass, and then regres­sion: pend­ing — the build­bot passes out the var­i­ous tests to the swarm which can be exe­cuted asyn­chro­nously until all tests are com­pleted (or error’d at which point they’re passed back to another client in the swarm).

The nice thing is that this works on both a local LAN, and a glob­ally dis­trib­uted series of test swarm par­tic­i­pants. All you do is weight in favor of the closer clients. (oh, and your appli­ca­tion has to be avail­able on the network).

Over time, peers par­tic­i­pat­ing in the swarm can be “pushed out” — mean­ing they have error’d out too many times, have been caught “lying” and so on. The swarm can adapt — clients can come and go as long as a given passed out suite even­tu­ally com­pletes. If a client fails/drops, the test is sim­ple re-passed out.

On a local­ized (mean­ing, internal-to-your-company) level, this means you can make any client on your net­work a peer on the sys­tem, and the weight-based selec­tion sys­tem still applies and you can use any type of sys­tem on your LAN — desk­tops, servers, highly intel­li­gent cof­fee mak­ers — any­thing with a net­work drop.

Addi­tion­ally, you could point test slaves at a clus­ter of installed system-under-tests — indi­vid­ual nodes in a web farm, or your appli­ca­tion installed on var­i­ous web hosts. Or a larger sys­tem installed in var­i­ous data cen­ters. This removes the bot­tle­neck of a sin­gu­lar sys­tem being tested at once (but requires a lot of intel­li­gence on the man­age­r­ial level).

It’s an idea. Some­thing of a dis­con­nected series of thoughts — maybe it’s silly. I like the idea of being able to intel­li­gently lever­age a series of test peers dis­trib­uted any­where and every­where. Hav­ing a peer-to-peer test­ing sys­tem would be neat-o.

It’s a zom­bie army used for test­ing –Anon :)

edit: Yes, a loosely cou­pled, highly dis­trib­uted load test could be con­strued as a DDoS… But that’s seman­tics, right?

References/Interesting Read­ing:

YAML question, and a nose-testconfig thought

August 29th, 2008 § 0 comments § permalink

So, I find myself using more and more YAML lately via the pyyaml pack­age. When I was writ­ing nose-testconfig my “pre­ferred” for­mat was/is YAML.

Now, an inter­est­ing thing I’ve noticed about all of the test con­fig­u­ra­tions I am developing/working with is that they have a lot of “shared” attrib­utes (that change infre­quently) and a good num­ber of things which change all the time.

This is the per­fect spot for some­thing like a dic­tio­nary merge. If you have a test con­fig like this:


application:
capability: 1
url: http://foo
subsystem:
max_users: 20

For each of your con­fig­u­ra­tion files, you might only over­ride some­thing like, max_users. For cases like this, it makes sense to load the tem­plate doc­u­ment (the file above) and then per­form a dict.merge() after load­ing the sec­ond doc­u­ment (over­rid­ing the val­ues in the first load) or some­thing akin to that.

This is where my men­tal dilemma comes in. I could in the­ory, add a cus­tom !!tag to the yaml which would take a /path/to/file.yaml and load it first, then load the sec­ond doc­u­ment or I could do it within nose-testconfig where you might run:


nosetests . --tc-file=myconfig.yaml --tc-rootconfig=parent.yaml

And then I would jump through the hoops (with a merge prob­a­bly) within the plu­gin. The prob­lem with that is that I’m wor­ried about cou­pling the plu­gin too closely to yaml.

Now, the plu­gin already sup­ports over­rid­ing mul­ti­ple val­ues: How­ever, this doesn’t scale if you have to over­ride a lot of them.

The most com­mon rea­son I’ve found for this so far is adding new para­me­ters and val­ues to the YAML files — not all child con­fig­u­ra­tions need to override/define the new val­ues, instead they could just inherit from the parent.

So, the ques­tion is — how do (would) you do this so you:

  • Don’t sac­ri­fice clarity/readability
  • Scales
  • Doesn’t require the root doc­u­ment to be in the same loca­tion or have a hard coded path in the child document
  • Doesn’t cou­ple the loader (nose-testconfig) tightly with the file format

Right now, it’s copy, paste, edit all con­fig­u­ra­tion files I know about, etc.

Pythoscope: Unit test generation for Python.

August 22nd, 2008 § 4 comments § permalink

Recently, there was a thread on the testing-in-python mail­ing list around a pro­posal for a new tool called “Pytho­scope” (dis­cus­sion here).

Pythoscope’s mis­sion — from the web­site is: “To cre­ate an eas­ily cus­tomiz­able and exten­si­ble open source tool that will auto­mat­i­cally, or semi-automatically, gen­er­ate unit tests for legacy sys­tems writ­ten in Python.” To which my gen­eral response is “woop”.

The ini­tial ver­sion was released ear­lier this week. It has a launch­pad site, and a detailed web­site.

This is pretty awe­some. Just on a lark — I decided I’d run it against Python-trunk (what will become 2.6) — unfor­tu­nately, try­ing to gen­er­ate tests for both the mul­ti­pro­cess­ing mod­ule and the thread­ing mod­ule worked not. This is quite prob­a­bly due to the fact I was not run­ning it under the py2.6 binary on my machine, but rather the default python 2.5 — there’s some con­fu­sion about the “with” keyword ;)

I’ll unscrew my envi­ron­ment and get back to you on that one.

Oth­er­wise, I ran it on some per­sonal code, and it came up with a pretty decent series of test stubs. Then I decided to run it on svnmerge.py:

?View Code PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class TestGetRepoRoot(unittest.TestCase):
    def test_get_repo_root(self):
        assert False # TODO: implement your test here
 
class TestTargetToPathid(unittest.TestCase):
    def test_target_to_pathid(self):
        assert False # TODO: implement your test here
 
class TestSvnLogParser(unittest.TestCase):
    def test_object_initialization(self):
        assert False # TODO: implement your test here
 
    def test_object_initialization(self):
        assert False # TODO: implement your test here
 
    def test_revision(self):
        assert False # TODO: implement your test here
 
    def test_author(self):
        assert False # TODO: implement your test here
 
    def test_paths(self):
        assert False # TODO: implement your test here

Pretty neat — it gen­er­ated all the stubs you could pos­si­bly think of. I am going to keep mon­key­ing with it — and pos­si­bly con­tribut­ing as it will save me a ton of time in the long run.

Looking for Test-Driven Python people (again)

August 1st, 2008 § 6 comments § permalink

Fol­low­ing up on my “Find­ing Python peo­ple is hard” I fig­ured I’d send the call out again.

We’re look­ing for local-to-massachusetts (we’re in Acton, MA) peo­ple who are inter­ested in join­ing a dynamic, quality-focused test/automation team. Ide­ally, can­di­dates are flu­ent in both test­ing (areas may include: per­for­mance, regres­sion, web, stream­ing video, stor­age) and Python programming.

If you’re a strong test­ing per­son with some pro­gram­ming — maybe you’re not flu­ent in python — we’d still be inter­ested: We have no prob­lem teach­ing you Python. If you’re a strong Python per­son, but maybe with­out a test­ing back­ground — you’re also wel­come. Inter­nally, we use Java/C++ and Python — expe­ri­ence with all, or some of those lan­guages is great.

Even if you’re just start­ing out — per­haps you’ve just grad­u­ated col­lege — we’re look­ing for peo­ple that want to be great engi­neers. We look for strong engi­neer skills, con­tri­bu­tion to open-source work — we’re look­ing for peo­ple of many skill lev­els to join the team.

The role is for some­one to join the Engi­neer­ing team with a focus on Auto­mated test engi­neer­ing. We don’t slot peo­ple into “just test­ing” or “just dev” — we hire great engi­neers, and peo­ple who want to be great engi­neers. The core devel­op­ers help drive test­ing, and the testing-focused peo­ple help drive core devel­op­ment. The entire com­pany is focused on pro­vid­ing the high­est qual­ity prod­uct to our customers.

As part of this role, you will be devel­op­ing every­thing from sim­ple unit tests to highly com­plex func­tional level tests. Some of the more chal­leng­ing aspects include the fact that the prod­uct itself is very performance-driven, so the tests we develop (in Python) have to be able to drive a prod­uct capa­ble of push­ing tens of giga­bits of video data across the wire to it’s very lim­its. The prod­uct uses dis­trib­uted tech­nolo­gies and is a loosely-coupled sys­tem — we have to test and prove that as well.

Inter­nally, we’re using such tools as the pro­cess­ing library for con­cur­rency, Nose, YAML, etc. We encour­age open-source con­tri­bu­tions and com­mu­nity involve­ment (see the nose plu­gin I recently open sourced, and the PEP 371 work I’ve been able to do) and explo­ration of new tech­nol­ogy that might help us devise more effi­cient test­ing strate­gies. If you like push­ing bound­aries — this is the place for you.

A great exam­ple of one of the chal­lenges is a test I’ve worked on for some time — I’ve had to design a highly con­cur­rent test that can lever­age a sin­gle test client’s resources to the max to drive load against the sys­tem, while also gen­er­at­ing sta­tis­ti­cal anom­aly events to trig­ger inter­nal behav­ior to the sys­tem. Of course — just design­ing it for one test client won’t scale: This test has to be locally con­cur­rent as well as have the abil­ity to spread out to mul­ti­ple test­ing clients. Oh — and it has to gen­er­ate hun­dreds of giga­bytes of data as fast as it can to push the system.

Some of the tech­nolo­gies I’ve been per­son­ally explor­ing are the Actor-Model approach to con­cur­rent pro­gram­ming, Twisted for asynchronous/concurrent test­ing, etc. No tech­nol­ogy or approach is excluded — we approach all of the test devel­op­ment with the zen of python in mind:

There should be one– and prefer­ably only one –obvi­ous way to do it.

And just to add to that: There is only one way to do it: The way that works. If we find that an old approach doesn’t scale or do what we need it to do, and we have a new approach that can do it bet­ter, faster, cheaper, etc — we’re not afraid to adopt it.

This is a startup: and we’re really ramp­ing up on the automa­tion of the tests — so noth­ing is set in stone. We use Ubuntu, OS/X and even Win­dows for the devel­op­ment envi­ron­ments. Pick your edi­tor, pick your machine — the team is focused on mak­ing us, and you suc­cess­ful. Every engi­neer in this orga­ni­za­tion is empow­ered to do what it takes to get the job done.

If what I am say­ing sounds inter­est­ing — send me an email, or post a com­ment. I’m very inter­ested in talk­ing to you.

Where Am I?

You are currently browsing the Testing category at jessenoller.com.