views:

56

answers:

2

I have a table of data with survey results, and I want to do certain calculations on this data. The datastructure is somewhat like this: (ignore all the data beeing similar, i cut and pasted all the rows)

____________________________________________________________________________________
| group |individual |        key         |        key         |        key         |
|       |           |subkey|subkey|subkey|subkey|subkey|subkey|subkey|subkey|subkey|
|       |           |q|q|q |q |q  |q|q|q |q|q|q |q |q  |q|q|q |q|q|q |q |q  |q|q|q |
|-------|-----------|-|-|--|--|---|-|-|--|-|-|--|--|---|-|-|--|-|-|--|--|---|-|-|--|
|   1   | 0001      |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |
|   1   | 0002      |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |
|   1   | 0003      |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |
|   2   | 0004      |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |
|   2   | 0005      |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |
|   3   | 0006      |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |
|   4   | 0007      |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |1|7|5 |1 |3  |1|4|1 |
------------------------------------------------------------------------------------

Excuse my poor ascii skills...

So, every individual belongs to a group, and has answered some questions. These questions are always grouped in keys and subkeys.

Is there any simple method to calculate averages, deviations and similar based on the groupings. Something like

public float getAverage(int key, int individual);
float avg = getAverage(5,7);

I think what I'm asking is what would be the best way to structure the data in C# to make it as easy as possible to work with? I have started making classes for every entity, but I got confused somewhere and something stopped working. So before I continue along this path, I was wondering if there are any other, better, ways of doing this?

(Every individual can also have describing variables, like agegroup and such, but that's not important for the base functionality.)

Our current solution does all calculations inline in the queries when requesting the data from the database. This works, but it's slow and the number of queries equals questions * individuals + keys * individuals, which could be alot if individual queries.

Any suggestions?

A: 

I doubt that you'll gain much improvement by moving the calculations from the database to an application.

I would recommend looking at the database design to see if you can improve it through normalization and by adding indexes. Most databases come with tools that can make design recommendations based on a given query. Then look at the query you are running to see if there is a more efficient way to write your query. I've personally seen queries that take O(n) rewritten to take O(log n). If you need help with that post as much of your design and query as you are comfortable with sharing.

juharr
A: 

Averages, standard deviations, and some other things can be calculated by taking one pass over the data and accumulating the count, the sum, the sum of the square of the data points. This was exploited in the days before computers by punching all the data on cards much as you illustrate, and running it through card machines (programmed by wires plugged into boards), and accumulating those totals, then doing the computations.

Which is a boring history lesson except that it illustrates that you may be able to semi-summarize your data by group as you take it from the database.

It will also pay off to be set up indexes on the descriptive (group) columns.

SeaDrive