views:

434

answers:

8

Bounty is for the name of a software package for generating tables and running experiments with particular parameters. That is it!

I am running experiments on computer programs with a number of variables whose effect I am measuring. For each run, I record the time, the precision and the recall.

Are there any frameworks that exist that can help me with running tests and generating tables? I am currently using the Python CSV library to generate basic tables. However, things get complicated depending on how many variables you want to change, since the table format may have to be different.

For an illustration of various formats see http://spreadsheets.google.com/pub?key=trRdDhrzIc0xH-CRuP0Ml6g&output=html. There are probably more formats that I am not aware of.

  • I would prefer it to be written in Python, but frameworks in other languages (esp. Java/C++ as I know them) would be acceptable
  • This bounty is only for listing an existing framework(s) - no rolling your own!
  • Also, I need the answer within 6 days, so the bounty will be awarded to the best answer Sunday night Australian time.
  • I've actually (finally) written my own framework, but I need to compare it against others.
  • Bounties are still valid for community wiki questions
  • This question is not about generating test data. It simply has to call the test function with appropriate arguments.
  • I am not just looking for a general tool like Matlab/Excel unless you can demonstrate that it has the functionality or provide the name of a plugin with suitable requirements.
+3  A: 

I could be misunderstanding, but you appear to be concerned about the presentation and seem to want to get beyond a 2d table with just one axis per variable. As you point out, things get difficult with more variables. A spreadsheet is inherently 2d and the best you'll get with graphing is 3d. I'd bet you have more variables than that.

My suggestion is organizing your data as columns of variables & results. List one experiment per row. That is sure simple for python.csv to deal with.

Slice/sort/manipulate the info with the "Data->Auto Filter" in Excel or Open Office to see which variables are significant. Graph in simple line graphs to see the relations you may miss. Then, after you have your relations figured out you might find that a 2d table is good way to present the data. Maybe you can take the next step with numpy or matplotlib.

Hope this helps.

goger
I actually am using matplotlib at the moment
Casebash
Yah, your tables are weird. Use one column per variable, one row per test.
Zac Thompson
A: 

A good framework to automate testing is nose: http://somethingaboutorange.com/mrl/projects/nose/0.11.1/

It recognizes any function with the word 'tests_' in its name as a function.

dalloliogm
This question is not about unit testing
Casebash
nose is not only a framework for unit testing.
dalloliogm
Really? Can you provide any details on how to do this. PS. unless you edit your answer the system won't let me remove my downvote
Casebash
+2  A: 

If the number of variables or the number of possible values for each variable is even moderately large, the set of tests can quickly become unwieldy. In that case you probably want to consider reasonable coverage with a subset of the values. An example tool for generating a candidate set of tests is Allpairs (perl, so you should be able to adapt it to your own needs if necessary). Not sure if this is the sort of "framework" you are looking for or not. See also csvtest.

... I'll repeat here my comment that you should use one column per variable, one row per test (numbers pulled out of thin air below)

a  b  c   time  precision  recall
1  1  0    2     0.5        0.7
1  2  0    3  ...etc.

Not only is this simpler to inspect, but you can perform analysis on it (including the cross-sections in your example spreadsheet) in a fairly straightforward manner as well. Think of it as a single database table.

Zac Thompson
That's kind of useful, but I'd really like to see if anyone else has made a table generation framework
Casebash
Also csvtest is for generating test data, not for running them
Casebash
What do you mean by "running them"? You wrote about "..running experiments on computer programs..": which programs are they (command line, GUI, ...)? How do you pass test data to these programs? Without these information I think is hard to suggest a solution suited to your problem.
MaD70
The framework I wrote lets you specify what you want to be in your table and it handles calling the function for you
Casebash
+1  A: 

I think PyTables will do what you are looking for, and more. It lets you make HDF5 tables/files. Guessing from the examples in the documentation, it was originally built for recording results from experiments.

It also allows you to nest columns, which if I understand your google docs example correctly, will allow you keep all of your results in one table as you vary a, b and c.

Edit: You can define tables with dictionaries instead of classes. Example. Documentation. This will allow you to create a table with an appropriate number of columns based on values of 'a'.

Basically your definition of columns will look something like this:

table_def = {'a1': Int32Col(pos=0), 'a2': Int32Col(pos=1), etc.}

Then, you'll add new records for each value of 'b'.

nazca
This is definitely the best lead so far, but it doesn't quite generate the type of tables I want. Its is designed to create tables which are simply lists
Casebash
Do you know in advance the approximate range of 'a' (from your example)? If so, you can use nesting to produce the bottom table. I haven't actually worked with nesting much, but if you can have multiple levels of nesting, you can also vary 'c' in one table.
nazca
The values are known in advance. I read the documentation, but I couldn't find a way to create a table in Pytables that even lets you very both a and b.
Casebash
If you use a dictionary to define your tables this shouldn't be a problem. I'll edit the answer...
nazca
This can do it albeit clumsily
Casebash
+1  A: 

If you are only concerned with display, you could write an Excel file with xlwt. (cheatsheet).

It can write xls files directly, without using Excel.

You can merge row label cells to group the results for each experiment. Or, just write one row per experiment, with each variable having its own column (as others have suggested) and use either a pivot table or sumproduct to shape the data into the desired format.

nazca
That could be useful, as I could make tables that look nicer than CSV. I'll definitely look at integrating that into my framework after I finish
Casebash
I think this is the fastest solution to implement. Especially the option of having a column per variable. Using sumproduct formulas will allow you to summarize the data in whatever format you want. It should only take minutes to implement the summary table. And, if you want to change your summary format, it should be super simple.
nazca
I've tried to like xlwt but failed. Especially the documentation isn't very good. I've started using the "Apache POI - Java API To Access Microsoft Format Files" (http://poi.apache.org/) instead. This could work if you're willing to write java or use jython.
Mattias Nilsson
+1  A: 

I believe Matlab to be perfect for generating and manipulating such tables. It treats the universe as vectors and matrices, and it's very high-level.
Downsides:
It's very expensive
If you plan to automate the running of tests, integrating Matlab to what your test programs are running at might be hard.

Emilio M Bumachar
There may exist a software package in Matlab for what I want to do, but Matlab itself doesn't count
Casebash
+1  A: 

I'm a little confused about exactly what you want to generate. Are you trying to just format tables, or actually generate the combinations of the values? In that case you'll want to look at design-of-experiment or test matrix generation tools like:

Jaykul
I am looking for a frame work to run experiments and format the results. The framework should have a nice way to specify the parameters - having to list them all in a file is not acceptable.
Casebash
+1  A: 

Some things that I suggest that you look at are PyEPL, the Python Experiment-Programming Library geared for psychology experiments, and this paper on Agile Control of a Complex Experiment which covers the authors experience in using Python for experiment control.

For internal manipulation of tables, you should be using NumPy and if you need to store them, then PyTables HDF5 format is widely used in science.

DAKOTA(Design Analysis Kit for Optimization and Terascale Applications) may be what you are looking for. Not Python, but ROSE(Repetitive Object-Oriented Simulation Environment) is also something to look at. Both of these incorporate Design of Experiments.

Michael Dillon
ROSE is extremely interesting. But apart from that paper, I can't seem to find any other info.
Casebash