ansaurus

Question

Interpolation in c# - performance problem

Answer 1

A:

This is the kind of problem where you need to move over to native code.

Will 2010-02-04 13:05:18

Native code? What do you mean?

Gacek 2010-02-04 13:19:27

he means to do it in unmanaged code

Sam Holder 2010-02-04 13:51:41

For such pure arithmetics CLR is no slower than native code - it will be compiled into very similar machine code in both cases. Managed can even be faster due to faster memory allocation.SIMD may give a performance boost, but it requires special care, you won't get good SIMD utilization in generic C code.

ima 2010-02-04 13:53:56

Thanks for the explanation, I will try to check it :)

Gacek 2010-02-04 13:56:38

@Will: do you realy think that operations on doubles and arrays are significantly faster in C (or any unmanaged language of your choice)? I wouldn't recommend switching to unmanaged code without profiling the existing code first.

Igor Korkhov 2010-02-04 13:59:47

The accesses would not be bounds checked, for example..

Will 2010-02-04 14:08:20

...and there are a lot of virtual method calls, though I doubt that you would see more than a couple of ms for 10'000 iterations. Then again the OP asked how to improve performance and unmanaged code is definitely a possibility. Just watch out for those dreaded unmanaged-managed transitions or they'll eat up all the performance gained by the beautiful C code :-P

SealedSun 2010-02-04 14:32:35

On bounds checking: even if it were noticeable, it's not related to managed or native code. You can have bounds checking in C++ and you can avoid it in C# with unsafe block. (Besides, I believe CLR can optimize away excessive bounds checking when arrays are accessed sequentially)

ima 2010-02-04 14:49:15

Answer 2

+1 A:

I'd say profile the code and see where it spends its time, then you have somewhere to focus on.

ANTS is popular, but Equatec is free I think.

Sam Holder 2010-02-04 13:35:44

Answer 3

A:

System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));

I hope it's release build without DEBUG defined?

Otherwise, it might depend on what exactly are those IList parameters. May be useful to store Count value instead of accessing property every time.

ima 2010-02-04 13:55:15

Yes, as I've commented somewhere... ;)

Gacek 2010-02-04 13:57:57

Answer 4

+2 A:

public static List<double> Interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
    var a = xItems.ToArray();
    var b = yItems.ToArray();

    var aLimit = a.Length - 1;
    var bLimit = b.Length - 1;

    var interpolated = new double[breaks.Count];

    var total = 0;
    var initialValue = a[0];
    while (breaks[total] < initialValue)
    {
        total++;
    }
    Array.Copy(b, 0, interpolated, 0, total);

    int id = 1;
    for (int i = total; i < breaks.Count; i++)
    {
        var breakValue = breaks[i];

        while (breakValue > a[id])
        {
            id++;
            if (id > aLimit)
            {
                id = aLimit;
                break;
            }
        }

        double value = b[bLimit];

        if (id <= aLimit)
        {
            var currentValue = a[id];
            var previousValue = a[id - 1];
            if (id != aLimit || breakValue <= currentValue)
            {
                var w = currentValue - previousValue;
                var p = (breakValue - previousValue) / w;
                value = b[id - 1] + p * (b[id] - b[id - 1]);
            }
        }

        interpolated[i] = value;
    }
    return interpolated.ToList();
}

I've cached some (const) values and used Array.Copy, but I think these are micro optimization that are already made by the compiler in Release mode. However You can try this version and see if it will beat the original version of the code.

Petar Petrov 2010-02-04 14:12:00

There is almost no difference, but thank you anyway, your code is a little bit cleaner :)

Gacek 2010-02-04 14:19:43

Answer 5

+2 A:

Instead of

interpolated.ToList()

which copies the whole array, you compute the interpolated values directly in the final list (or return that array instead). Especially if the array/List is big enough to qualify for the large object heap.

Unlike the ordinary heap, the LOH is not compacted by the GC, which means that short lived large objects are far more harmful than small ones.

Then again: 7000 doubles are approx. 56'000 bytes which is below the large object threshold of 85'000 bytes (1).

SealedSun 2010-02-04 14:17:36

Answer 6

+1 A:

Looks to me you've created an O(n^2) algorithm. You are searching for the interval, that's O(n), then probably apply it n times. You'll get a quick and cheap speed-up by taking advantage of the fact that the items are already ordered in the list. Use BinarySearch(), that's O(log(n)).

If still necessary, you should be able to do something speedier with the outer loop, what ever interval you found previously should make it easier to find the next one. But that code isn't in your snippet.

Hans Passant 2010-02-04 15:07:32

Answer 7

+1 A:

Hi,

few suggestions,

as others suggested, use profiler to understand better where time is used.
the loop

while (breaks[x] < xItems[0])

could cause exception if x grows bigger than number of items in "breaks" list. You should use something like

while (x < breaks.Count && breaks[x] < xItems[0])

But you might not need that loop at all. Why treat the first item as special case, just start with id=0 and handle the first point in for(i) loop. I understand that id might start from 0 in this case, and [id-1] would be negative index, but see if you can do something there.

If you want to optimize for speed then you sacrifice memory size, and vice versa. You cannot usually have both, except if you make really clever algorithm. In this case, it would mean to calculate as much as you can outside loops, store those values in variables (extra memory) and use them later. For example, instead of always saying:

id = xItems.Count - 1;

You could say:

int lastXItemsIndex = xItems.Count-1;
...
id = lastXItemsIndex;

This is the same suggestion as Petar Petrov did with aLimit, bLimit....

next point, your loop (or the one Petar Petrov suggested):

while (breaks[i] > xItems[id])
{
  id++;
  if (id > xItems.Count - 1)
  {
    id = xItems.Count - 1;
    break;
  }
}

could probably be reduced to:

double currentBreak = breaks[i];

while (id <= lastXIndex && currentBreak > xItems[id]) id++;

and the last point I would add is to check if there is some property in your samples that is special for your problem. For example if xItems represent time, and you are sampling in regular intervals, then

w = xItems[id] - xItems[id - 1];

is constant, and you do not have to calculate it every time in the loop.

This is probably not often the case, but maybe your problem has some other property which you could use to improve performance.

Another idea is this: maybe you do not need double precision, "float" is probably faster because it is smaller.

Good luck

2010-02-04 15:11:39

Thanks Zarko. One thing: `w` is not always constant. It is distance between two subsequent samples, which is not always the same. That's the point why I am calculating it each time. And changing to floats could be good idea, I will try it

Gacek 2010-02-04 15:17:12

ansaurus

tags:

views:

answers:

Interpolation in c# - performance problem

related questions