tags:

views:

340

answers:

6

Hello,

I need to process about 500,000 data points each consisting of 4 decimals. I'd like to use and array of structs to do this. Would this be much slower than using an array of arrays? It seems that memory won't be an issue, but speed will - it needs to be fast.

Quick code sample of two options:

Option 1:

public struct Struct
{
    public decimal A { get; set; }
    public decimal B { get; set; }
    public decimal C { get; set; }
    public decimal D { get; set; }
}

Usage:

private Struct[] data;

Option 2:

private decimal [][] data;

Also, is decimal the right data type to use? The data points are money...

Thanks! Brian

+1  A: 
  1. Decimal is the correct type to be using, if you're dealing with currency values.
  2. An array of structs will be quite fast.

Be aware, though, that when you're dealing with an array of structs, individual struct elements (especially since you have each value as a property) will need to be treated as a single, immutable object. This means that, if you want to change C in array element 4, you'll need to do:

MyStruct val = array[5];
val.C = newValue;
array[5] = val;

Switching to public fields can reduce some of this, but adds its own problems. Mutable structs make things more complicated, at times...

Reed Copsey
+3  A: 

If you are processing A,B,C,D at the exact same time, the array of structs method should have better spatial locality - since the data is clumped together it will be paged into memory and the same time (fewer page faults) and fetched into the CPU cache at the same time. If you process all of A, then all of B, etc., then the opposite will be true and you should use array of arrays.

If not terribly difficult, I suggest you try both options and measure and see what one is better. If this is too difficult, use whichever approach is simpler and easy to understand and then measure to see if it meets your performance goals.

Michael
thanks - im in the midst of testing right now i'll post back when i have a feel for which was faster.
sweeney
array of struct was blazing fast - it was good enough that i didnt bother to try out array of array. i feel that struct gets you the best of both worlds: passing around a lightweight value rather than an object but still able to maintain good OOD and readability/maintainability.thanks for all the help!
sweeney
+1  A: 

hmm... If you replace the array of arrays with a two-dimensional array, the resulting memory layout should be more or less equivalent:

private Struct[] data = new Struct[x];
private decimal[,] data = new decimal[x,4];

Unless you were hoping to pass around one of the arrays to other methods...

LorenVS
+2  A: 

Just a side comment to the previous post about using two-dimensional arrays:

An array of arrays (sometimes called a jagged array) provides better performance than a two dimensional array because the two-dimensional address translation requires a multiply and an add whereas the jagged array only requires two adds.

Of course the difference only shows up after millions of look-ups.

Chris Judge
It's true that there's a multiply, but the real performance gain is because there is a dedicated IL instruction for 1-d array access and not for 2-d arrays (http://blogs.msdn.com/ericlippert/archive/2009/08/17/arrays-of-arrays.aspx).
Brian
I find this slightly confusing, I may be wrong, but I would think the performance gain would be in the dereferencing... By using a "jagged" array, you are undergoing two an addition, a dereferencing, and another addition... I may be wrong, but I'm pretty sure that would end up being slower... I'm open to more info though, I always assumed a 2d array would be faster
LorenVS
+1  A: 

When dealing with money, it is often faster and far more efficient to use integers if you are performing comparisons or simple addition and subtraction, and don't need to worry about rounding errors.

sylvanaar
yea but you can only use whole dollars? do you suggest that i represent dollars in one field and cents in another, both as integers and perform the carry-overs programmatically?
sweeney
You use cents, or whatever your precision needs to be. In your case its 4 decimal places, so you use 1/100 cents as your unit. So $12.3456 is represented as 123456. The only time you need to worry about currency is when you format the number. If you deal with large numbers, you can use long integers intead.
sylvanaar
+1  A: 

Struct array and jagged arrays are laid out in memory pretty much the same way, so you shouldn't get a performance hit from using it.

public struct Struct
{
    // Unless you're filling your get/set blocks with anything,
    // these properties will be in-lined in compilation time
    // and will have the same performance/behavior as using public fields

    public decimal A { get; set; }
    public decimal B { get; set; }
    public decimal C { get; set; }
    public decimal D { get; set; }
}

So I'd consider using public fields. But that's just my opinion, I like to know explicitly how things are going to behave.

About using decimal for money, that's not always true. decimal is a 128-bit data field, it has VERY high precision, but it's integer part has a narrow range of values. If you need high precision for calculating rates or something like this but you don't need really high values, go for decimal. If you need higher values and not so much precision, go for double. If you're dealing with small values and just need a fair amount of precision, go for float.

Remember that the closer the data type is to 32-bit (or the width of your bus), the less time it will take for the data to be loaded.

Hope this helps!

diogoriba