views:

42

answers:

2

Hi...

I'm developing a java program on my machine. When I want to test, I first try small test cases on my machine but then I want to run this program with real data. A small test would be to look at one file from the linux kernel and a "real" test would be to look at an entire kernel...

But I would like to run multiple "real" tests at the same time (on different versions of the kernel), so I've 5 test machines which are identical (running linux fedora). How can I synchronize these 5 machines both in terms of data and programs (i'm using yum sometimes to install programs) ?

How can I be sure that i've exactly the same environment at any time ?

Right now I'm mainly using scp and my code is on svn...

+2  A: 

I would suggest using a virtualization technology like VMware to clone one instance into multiple. This has the benefit of always being able to get back to the same starting point after your testing, as well as being able to run more scenarios than you have physical boxes.

Chris Thornhill
A: 

You could make Your benchmarking tool fetch the new version of the program and the testdata over http or some other popular protocol. If You don't want to use http, You could consider using a networked filesystem (NFS and GlusterFS are some examples).

The way I see it, You would start Your script on a single 'master' server and then it would spawn 5 bash processes. Each one of them would log into a remote 'slave' server and execute a benchmarking tool. The benchmarking tool would fetch the new program and data from the some (perhaps 'master') server and then execute it, measuring time/memory/etc and returning those values on the standard output, so the ssh would pass it back to the master server and it's bash process. You would then redirect the bash sessions output to files.

testmaster.sh -> 5*(testnode.sh -> ssh
-> fetch_and_benchmark -> output -> (ssh) -> testnode.sh -> file)

testmaster.sh would have to check if all the files already exist f.e. every second and then read and compare the results. It may all sound bad, but trust me, it'd be better if You wouldn't do all of it manually and the script is relatively easy to write.

About ensuring You have the same environment on all boxes... Don't let anyone near Your test nodes and don't do anything on them.

I'd not advise virtualization, as it would alter Your tests results in a way You can't predict. Virtualized machine can't be as fast as a pure one and it's not simply "it's 20% slower". Some things are much slower, some not that much. If You don't mind it, use Virtualization and snapshots, but You said You test things on multiple nodes with each having a diffrent kernel, so I imagine You take it very seriously.

Ah, one more thing. Linux has a funny way of freeing memory (it's like "don't free it until" someone will need it). Some things get cached. To be sure, You'd have to reboot the test machines after every test session.

Reef