I'not getting a lot of answers on this ;-) so I'll attempt to explain what my ideas are.
Eventually, I went for iozone as a benchmarking tool, mainly because of the overwhelming amount of information it provides.
IMHO, if one tries to find statistically significant data, one run of a benchmark is not enough, so I wrote a little shell script to run iozone 10 times and write the output to 10 logfiles.
Then, I wrote another script (this one in Python) to add up all the values for each cell in the matrices. The highest and lowest value for each cell are discarded in order to prevent anomalies distorting reality ;-)
For each cell, I take the average value of the eight remaining values. I copied the resulting matrix into OpenOffice.org Calc.
I did this procedure for a 'baseline' (in my case: a ext3 filesystem mounted with default options) and then repeated it for my tests.
Per test, I copied the matrix of the result into the spreadsheet where I had the baseline stored. In Calc, I did my comparison and plotted the results in diagrams.
Works pretty well.