tags:

views:

86

answers:

3

Hi,

I have a set of 5 values from experiment, E1, ..., E5 and results from 10000 different simulations, sim_A_B_C.out . From each simulation I get S1, ..., S5 .

I want to study the correlation between experimental and simulated values. So I would like to perform linear regression for each set in a script that loops for the whole set of 10000 result files.

What is the best way of performing linear regression in bash or python? I used to do it with sigmaplot but it is not so good for so big set of data

A: 

In python, there's a function stats.linregress in the SciPy package that you can use.

A: 

Hi

I'd avoid bash for this and use Python -- actually I'd use Matlab or Mathematica but neither is on your list. So install Numpy and possibly Scipy and crack on.

Regards

Mark

High Performance Mark
+1  A: 

I expect that each of your simulations has some input values which differ, for instance, x is 1 for the first, 2 for the second, and then you have some function f(x) which runs the simulation and generates 5 points for each simulation. From your example, I expect x is actually three values, A, B, C.

In that case, what you want to discover is the value of x which generates the best simulation.

In this case, you really need to find the correlation between f(x) to the experimental result, rather than the simulated result itself.

The reason for this is that finding a good correlation between the simulations and the experimental result has too many variables (if you assume the simulations are independent of each other), and will probably find a fit just by chance.

I think you should also obtain additional experimental values, to increase your confidence.


My favourite language for such things is R, which is free and available for most platforms at a download site near you, and I recommend the book "Introduction to Statistics using R", which gives lots of potted examples for you to try, and runs you through beginning statistics to some quite advanced things.

Alex Brown
+1 for advice on "R".
Guru