A Peer to Peer test distribution system (TestBot)?

September 8th, 2008 § 7 comments

Peer-to-Peer sys­tems aren’t some­thing new. Things like Bit­tor­rent, AllMy­Data Tahoe, and oth­ers have been using it for file stor­age for some time.

Still oth­ers use the distributed-worker method­olo­gies to do work parcel­ing — they reg­is­ter with the sys­tem, and the sys­tem hands out chunks of work with­out fac­tor­ing in client speed/etc (e.g. distributed.net).

What if you com­bined the two — you used some­thing like Bit­tor­rent which does peer-selection and allo­ca­tion intel­li­gently, with a large dis­trib­uted archi­tec­ture to man­age large scale test execution?

Let’s think about a com­mon prob­lem with test engi­neer­ing. Start with a sim­ple ver­sion — you’re design­ing a load test app, this app needs to gen­er­ate large amounts of load against a tar­get system.

In a nor­mal test envi­ron­ment in a lab — this is “easy” — you sim­ply make sure you have a lab with a bunch of clients, all on the same LAN and you run a test client from all of them that gen­er­ate load against the sys­tem under test.

Now, let’s com­pli­cate the prob­lem: You don’t have enough “same same” test clients. You may have some “close enough” but dang — they’re not on the same sub­net, or you don’t know about them. Not hav­ing enough clients in a lab is more com­mon than you’d think.

So how do you make a test that can take advan­tage of those test clients, fac­tor in their “dif­fer­ences” and still make a rel­e­vant test?

Next prob­lem. You have an appli­ca­tion you want to run a bat­tery of tests against. You don’t have a ded­i­cated client, but you have the pos­si­bil­ity of “bor­row­ing time” from some idle machines to run those tests.

The “idle machines” all have dif­fer­ent ram, CPU and are vary­ing dis­tances from the sys­tem under test on the net­work. You need to 1> Find them, 2> Fig­ure out which of the avail­able test clients is the most desir­able 3> Be able to fig­ure out the main dif­fer­ences between the clients to fac­tor them into results.

You sim­ply want the more capa­ble clients to get more of the “impor­tant” tests, and the less capa­ble ones to run the lesser tests. Just to add to it, you want them to pos­si­bly be capa­ble of being slaved to a given test to help it along (i.e. a per­for­mance or gen­er­al­ized load gen­er­a­tion test).

Get­ting back to the orig­i­nal thought about peer-to-peer sys­tems, I started con­sid­er­ing the pos­si­bil­ity of apply­ing the peer to peer paradigm/weighted selec­tion to test distribution.

You have a series of clients who vol­un­teer to par­tic­i­pate in the swarm. The client respon­si­ble for sub­mit­ting the job (a test) to the swarm would use a Weighted Vot­ing algo­rithm to rank, sort and choose the “most desir­able” clients to dis­trib­ute a test to.

Each client would respond to a sub­mit­ted request with var­i­ous attrib­utes (weights) based on OS Type, num­ber of hops from the client sub­mit­ting the job and the system-under-test, amount of ram, net­work speed and so on.

In the case of per­for­mance based tests, you would be able to fac­tor these attrib­utes into the results of the test (e.g. latency) — in other tests, you only need to gather the results.

Of course, the con­cept of a “use idle machines to do some­thing” isn’t exactly new — things like distributed.net, seti@home and oth­ers do this all the time as I men­tioned before.

Then you have things like build­bot — build­bot uses a ded­i­cated (or par­tially ded­i­cated) pool of machines to com­pile a tar­get and exe­cute the local unit tests against the com­piled thing.

Why not make the two go hand in hand and make an intel­li­gent weighted selec­tion for test dis­tri­b­u­tion? Let’s go back to the local­ized exam­ple. You have a con­tin­u­ous build sys­tem which com­piles and run units. It then looks at a pool of test-peers who have vol­un­teered to be part of the test-swarm and fires off the functional/regression tests (as needed, it can locally deploy or remotely deploy to a test-server).

The build­bot reports the steps as com­pile: pass, units: pass, and then regres­sion: pend­ing — the build­bot passes out the var­i­ous tests to the swarm which can be exe­cuted asyn­chro­nously until all tests are com­pleted (or error’d at which point they’re passed back to another client in the swarm).

The nice thing is that this works on both a local LAN, and a glob­ally dis­trib­uted series of test swarm par­tic­i­pants. All you do is weight in favor of the closer clients. (oh, and your appli­ca­tion has to be avail­able on the network).

Over time, peers par­tic­i­pat­ing in the swarm can be “pushed out” — mean­ing they have error’d out too many times, have been caught “lying” and so on. The swarm can adapt — clients can come and go as long as a given passed out suite even­tu­ally com­pletes. If a client fails/drops, the test is sim­ple re-passed out.

On a local­ized (mean­ing, internal-to-your-company) level, this means you can make any client on your net­work a peer on the sys­tem, and the weight-based selec­tion sys­tem still applies and you can use any type of sys­tem on your LAN — desk­tops, servers, highly intel­li­gent cof­fee mak­ers — any­thing with a net­work drop.

Addi­tion­ally, you could point test slaves at a clus­ter of installed system-under-tests — indi­vid­ual nodes in a web farm, or your appli­ca­tion installed on var­i­ous web hosts. Or a larger sys­tem installed in var­i­ous data cen­ters. This removes the bot­tle­neck of a sin­gu­lar sys­tem being tested at once (but requires a lot of intel­li­gence on the man­age­r­ial level).

It’s an idea. Some­thing of a dis­con­nected series of thoughts — maybe it’s silly. I like the idea of being able to intel­li­gently lever­age a series of test peers dis­trib­uted any­where and every­where. Hav­ing a peer-to-peer test­ing sys­tem would be neat-o.

It’s a zom­bie army used for test­ing –Anon :)

edit: Yes, a loosely cou­pled, highly dis­trib­uted load test could be con­strued as a DDoS… But that’s seman­tics, right?

References/Interesting Read­ing:

  • Dis­trib­uted Sys­tems for Sys­tem Archi­tects — Weighted vot­ing.
  • Weighted Vot­ing for Repli­cated Data
  • Skoll: Dis­trib­uted Con­tin­u­ous Qual­ity Assur­ance
  • Bit­Tor­rent (pro­to­col)

      • terry pep­pers

        Jesse —

        You men­tion Skoll. And I’m sure you’ve already seen Adam Porter and Atif Memon talk from GTAC last year. Very inter­est­ing concepts.

        http://www.youtube.com/watch?v=OiE9zRPD6ps

      • jnoller

        Yup. I wanted to go to GTAC last year (and this year too) but didn’t have the chance. Skoll is inter­est­ing for a vari­ety of rea­sons, and there are some par­al­lels to what I am talk­ing about in the distributed-slave-sense. I’m glean­ing what seems to be the more meaty parts of Skoll from the papers and publications.

      • http://orestis.gr Orestis Markou

        We have some­thing like that in Resolver Sys­tems — we call it “dis­trib­uted build”.

        It builds your work­ing copy, copies it to the machines you spec­ify in the LAN and pub­lishes the list of tests it wants to be run.

        A cen­tral server assigns tests to machines and gath­ers the result back. It’s not clever, but it does its job. I’ve always wanted to make it more smart so that new machines could be added trans­par­ently when they are idle, but it was always too much work…

      • jnoller

        Yeah — I’m tak­ing the dis­trib­uted build thing a bit far­ther. I want a (globally)disparate series of test clients avail­able to test any appli­ca­tion, where those clients might test the app “locally” — in the case of desk­top apps for instance, or they might exe­cute a test passed to them which uses the client as CPU to bind to a larger test “in the swarm”.

        I’ve writ­ten three dif­fer­ent “man­ager passes tests to clients” and gets results back — those are easy, and rel­a­tively dumb.

        I would much rather have the slaves reg­is­ter with the server and pro­vide the weight num­ber which indi­cates the “desir­abil­ity” of the client — a sim­ple way of doing this would be to pass an object across the wire con­tain­ing the .network_hops_from_target, .ram, .cpu and so on attrib­utes, and do the cal­cu­la­tions on manager-side.

        Doing it the sim­ple way has the addi­tional ben­e­fit of allow­ing a test-in-the-queue to dic­tate what attribute it’s more inter­ested in. Of course, if you cal­cu­late the weight num­bers the clients pass back cor­rectly — you don’t need the tests to dic­tate what it “wants”.

        Argh, it’s still a jum­ble of ideas.

      • http://nutritionfoods09.blogspot.com/ nutri­tion foods

        Thanks for the info. May God have mercy on us all.

      • http://nutritionfoods09.blogspot.com/ nutri­tion foods

        Thanks for the info. May God have mercy on us all.

      • http://vip-diet.blogspot.com weight loss diets

        Many now inter­ests how cor­rectly to eat. The num­ber of the peo­ple dis­sat­is­fied with the fig­ure or health recently has
        increased and, as con­se­quence, try­ing to get rid of excess weight. You should pick up a diet approach­ing you, and also learn to make cor­rectly bal­anced diet.

What's this?

You are currently reading A Peer to Peer test distribution system (TestBot)? at jessenoller.com.

meta