views:

180

answers:

2

I have a function that's called hundreds of thousands of times per update, and i need to optimize it. Now, i generally follow the "don't optimize too soon" rule but this is a critical function that virtually all of my code's time is spent in, so anything you can suggest would help. I'm also not that familiar with any sort of tips and tricks that can be used to optimize XNA or c# code. Can you help me?

if (linearPosition.Y < _min.Y || linearPosition.Y > _max.Y)// the nonlinear space commisioned doesn't cover it so that's the behavior i want, same case with next line
{
    return linearPosition;
}
if (linearPosition.X < _min.X || linearPosition.X > _max.X)
{
    return linearPosition;
}
PositionData[] fourNearestPoints = new PositionData[4] 
{
    new PositionData {distance = float.MaxValue},
    new PositionData {distance = float.MaxValue},
    new PositionData {distance = float.MaxValue},
    new PositionData {distance = float.MaxValue}
};

for (int x = 0; x < _restPositions.GetLength(0); x++)
{
    for (int y = 0; y < _restPositions.GetLength(1); y++)
    {
        PositionData temp = new PositionData
        {
            indexX = x,
            indexY = y,
            value = _restPositions[x,y],
            distance = (linearPosition - _restPositions[x,y]).Length()
        };
        if (temp.distance < fourNearestPoints[0].distance)
        {
            fourNearestPoints[3] = fourNearestPoints[2];
            fourNearestPoints[2] = fourNearestPoints[1];
            fourNearestPoints[1] = fourNearestPoints[0];
            fourNearestPoints[0] = temp;
        }
    }
}
Vector2 averageRestVector = new Vector2((fourNearestPoints[0].value.X +
    fourNearestPoints[1].value.X +
    fourNearestPoints[2].value.X +
    fourNearestPoints[3].value.X) / 4,
    (fourNearestPoints[0].value.Y +
    fourNearestPoints[1].value.Y +
    fourNearestPoints[2].value.Y +
    fourNearestPoints[3].value.Y) / 4);
Vector2 averageDeformedVector = new Vector2((_deformedPositions[fourNearestPoints[0].indexX, fourNearestPoints[0].indexY].X +
    _deformedPositions[fourNearestPoints[1].indexX, fourNearestPoints[1].indexY].X +
    _deformedPositions[fourNearestPoints[2].indexX, fourNearestPoints[2].indexY].X +
    _deformedPositions[fourNearestPoints[3].indexX, fourNearestPoints[3].indexY].X) / 4,
    (_deformedPositions[fourNearestPoints[0].indexX, fourNearestPoints[0].indexY].Y +
    _deformedPositions[fourNearestPoints[1].indexX, fourNearestPoints[1].indexY].Y +
    _deformedPositions[fourNearestPoints[2].indexX, fourNearestPoints[2].indexY].Y +
    _deformedPositions[fourNearestPoints[3].indexX, fourNearestPoints[3].indexY].Y) / 4);

Vector2 displacement = averageDeformedVector - averageRestVector;
return linearPosition + displacement;
+1  A: 

The first thing I'd try is losing the fourNearestPoints array... Perhaps using just 4 variables for the 4 nearest locations. You always treat this by constant index, so this should be a simple change, especially if you name like the array index:

PositionData fourNearestPoints_0 = ...,
             fourNearestPoints_1 = ...,
             fourNearestPoints_2 = ...,
             fourNearestPoints_3 = ...;


The next thing I'd look at is the _restPositions usage; I don't know if GetLength (in this use) will be optimized out, so I would try pre-caching that. In a linear array, .Length is optimized (in the full CLR, at least) - but not GetLength AFAIK:

int width = _restPositions.GetLength(0), height = _restPositions.GetLength(1);
for (int x = 0; x < width; x++)
{
    for (int y = 0; y < height; y++)


Also; what is PositionData? A struct or a class? I would be tempted to try this as both - making sure it is immutable, and passing in the data via the constructor to make the IL slimmer:

PositionData temp = new PositionData(x, y, _restPositions[x,y],
        (linearPosition - _restPositions[x,y]).Length());


In the following, you are doing some work that is discarded most of the time:

    PositionData temp = new PositionData
    {
        indexX = x,
        indexY = y,
        value = _restPositions[x,y],
        distance = (linearPosition - _restPositions[x,y]).Length()
    };
    if (temp.distance < fourNearestPoints[0].distance)
    {
        fourNearestPoints[3] = fourNearestPoints[2];
        fourNearestPoints[2] = fourNearestPoints[1];
        fourNearestPoints[1] = fourNearestPoints[0];
        fourNearestPoints[0] = temp;
    }

I would do:

var distance = (linearPosition - _restPositions[x,y]).Length();
if (distance < fourNearestPoints_0.distance) {
    fourNearestPoints_3 = fourNearestPoints_2;
    fourNearestPoints_2 = fourNearestPoints_1;
    fourNearestPoints_1 = fourNearestPoints_0;
    fourNearestPoints_0 = new PositionData(x, y, _restPositions[x,y], distance);
}


I'm also interested in that that distance=... line; there is a lot there that we can't see that might need more work - the - operator, and the Length() method.


Am I right in assuming that Length() involves a square-root? (expensive) You could avoid this by working in square distances instead. You might need to make this explicit with a square-length method that doesn't take the root, and compare square lengths throughout, but you can save lots of CPU cycles. This is available as LengthSquared().

Marc Gravell
I had it as struct initially but changing to sealed class boosted the performance by like 125 percent! Will separating out the GetLength calls into variables help if the calls are only made once?
RCIX
Separating the array into 4 variables didn't help really though.
RCIX
`GetLength(0)` is called for "rows + 1". `GetLength(1)` is called, per row, "cols + 1" times. So if your array is 10x10, this is 11 + (10 * 11), so 121 times.
Marc Gravell
woah that would be a big problem too. I'll split that immediately. As far as the - operator and the .Length() calls, those are just standard XNA Vector2 operations, and you ought to be able to get something on those with Reflector or something. I never was sure what was the fastest....
RCIX
Actually there's a LengthSquared builtin which i switched to at your suggestion, which combined with some of your others boosts it another 100%. I'm going to do some profiling now and see what's the biggest time drainer, be back in a second.
RCIX
100% isn't bad for psychic debugging ;-p
Marc Gravell
Ok it looks like about 90% of the time is spent in the loop, is there any way to do the "look for 4 nearest grid points" part faster?
RCIX
Actually more like... um hang on... 1400% total increase? yeah i'm not good at optimizing code. Thanks for the help so far!
RCIX
I was doing relative to the last jump in speed, so that's why the other numbers may be confusing.
RCIX
I'd already assumed the loop was the problem; I'm about "out" there...
Marc Gravell
Yeow, then this method may not be feasible for use in the application i was hoping it would be. I'll experiment with other ideas though, and see if i can't come up with something close enough.
RCIX
Thanks for helping though!
RCIX
In case you're interested, i vastly sped it up by using a friend's idea: directly calculate the grid cell the position supplied is in, instead of manually looking for it. Unfortunately that invalidates most of your optimization, but thanks for the help still!
RCIX
+2  A: 

For one thing, try using a one-dimensional array instead of a rectangular array for _restPositions - the CLR is optimised for zero-based one-dimensional arrays. Just keep an index into the array, and increment it on each iteration:

int index = 0;
// I'm assuming you can pass in width and height separately
for (int x = 0; x < width; x++)
{
    for (int y = 0; y < height; y++)
    {
        PositionData temp = new PositionData
        {
            indexX = x,
            indexY = y,
            value = _restPositions[index],
            distance = (linearPosition - _restPositions[index]).Length()
        };
        if (temp.distance < fourNearestPoints[0].distance)
        {
            fourNearestPoints[3] = fourNearestPoints[2];
            fourNearestPoints[2] = fourNearestPoints[1];
            fourNearestPoints[1] = fourNearestPoints[0];
            fourNearestPoints[0] = temp;
        }
        index++;
    }
}

If you could make a constructor to PositionData which took appropriate values instead of having separate property setters, that might help too.

You're also indexing into fourNearestPoints several times - any reason not to just use four local variables? It won't do much, but you never know...

Jon Skeet