views:

293

answers:

10

I need help with making this bit of code faster:

UnitBase* Formation::operator[](ushort offset)
{
 UnitBase* unit = 0;
 if (offset < itsNumFightingUnits)
 {
  ushort j = 0;
  for (ushort i = 0; i < itsNumUnits; ++i)
  {
   if (unitSetup[i] == UNIT_ON_FRONT)
   {
    if (j == offset)
     unit = unitFormation[i];
    ++j;
   }
  }
 }
 else
  throw NotFound();
 return unit;
}

So, to give some background, I have this class Formation which contains an array of pointers to UnitBase objects, called UnitFormation. The UnitBase* array has an equally sized array of numbers that indicate the status of each corresponding UnitBase object, called UnitSetup.

I have overloaded the [] operator so as to return only pointers to those UnitBase objects that have a certain status, so if I ask for itsFormation[5], the function does not necessarily return UnitFormation[5], but the 5th element of UnitFormation that has the status UNIT_ON_FRONT.

I have tried using the code above, but according to my profiler, it is taking way too much time. Which makes sense, since the algorithm has to count all the elements before returning the requested pointer.

Do I need to rethink the whole problem completely, or can this be made somehow faster?

Thanks in advance.

+7  A: 

One quick optimization would be to return the unit as soon as you find it, rather than continuing to iterate over all of the rest of the units, e.g.

if (j == offset)
 unit = unitFormation[i];

becomes

if (j == offset)
 return unitFormation[i];

Of course, this only helps in the case that the unit you're looking for is towards the front of the unitFormation sequence, but it's trivial to do and does help sometimes.

A more involved, but more effective way to make this faster would be, for each status, to build and maintain a linked list of units that have that status. You would do this in parallel to the main array of units, and the contents of the linked lists would be pointers into the main units array, so you are not duplicating the unit data. Then, to find a given offset within a status, you could just traverse to the offsetth node of the linked list, rather than iterating over each unit.

Making it a doubly-linked list and keeping a tail pointer would allow you to find elements with high offsets just as quickly as low offsets (by starting from the end and going backwards).

However, this would still be slow if there are a lot of units with the same status and you are looking for one whose offset is near the middle.

Tyler McHenry
I had it like that before (with an immediate return) but the compiler was complaining because I had also removed the return after the throw code. Now I've put a break if the unit is found so it should give the same functionality as you said. In fact it does give a bit of an improvement, say 10%-20% or so, but it is still too slow. The problem is that this particular operator is used about 300k times in this test I'm trying out, so with an average of 1200 units that gives a whole lot of lines to go through...
Kristian D'Amato
The parallel linked list idea sounds interesting. I might try it out. Or would swapping 'bad' units for 'good' units at the end of the array work? I think it may, but in that case I'd lose the position on the front of each unit... argh!
Kristian D'Amato
@Kristian: For that kind of speed, and for random access, linked lists are not appropriate. Use a std::vector instead for keeping tabs on your front units. You'll pay the penalty of slower addition/insertion, but lookups will be much much faster.
Owen S.
Owen is right; it's a tradeoff. If units are changing state frequently, linked lists are the best idea, since they have fast insertion/deletion but slow lookup (but still faster lookup than what you have now, as long as not all - or nearly all - units are in the same state). If units change states only rarely, and they are looked up frequently, then maintaining vectors of pointers to units for each state will give you a faster lookup at the expense of much slower state changes.
Tyler McHenry
@Kristian Your compiler was complaining because it's not smart enough to know at compile-time that the `j == offset` condition will always evaluate to true at some point in the loop. You could have just appeased the compiler with a `return NULL` at the end, since you know it will never be reached, even if the compiler doesn't. A `break` works too, as you found.
Tyler McHenry
+4  A: 

What about redesigning your code to maintain a table of "units on front" whatever that means, sounds interesting :-). If that part is really queried a lot and not modified often, then you'll save some time. Instead of inspecting the whole or parts of the complete list of units, you'll get the result instantaneously.

P.S.: int shall use the most natural type for your CPU, so using ushorts doesn't make necessarily your program faster.

jdehaan
+1. To quote Mike Abrash, optimizing your slow algorithm as is will result in fast, slow code. Rethinking your data structures so that the common operations are fast by design is much better.
Michael
I'm suspecting so... By the way, would int really be faster than ushort?
Kristian D'Amato
Probably - every time you load a short, it has to do a few extra shifts or ands to clear the upper 16-bits. They add up over time.
Michael Dorgan
You can never tell for sure, but until your profiler doesn't tell you I would strongly advice to think about algorithms before twiddling the datatypes. Better win 50% with algorithms than tuning 2% perf because you use shorts instead of ints if it ever brings that much. You have the burden to take care of overflows during arithmetic and so on (you have anyway but shorts are quicker with overflowing :-D).
jdehaan
I added a link to satisfy the curious people...
jdehaan
Jdeehan, you are right! I tried this bit of code: for (ushort i = 0; i < 10000; ++i) for (ushort j = 0; j < 10000; ++j) k += j;Using ushort it takes 80 ms, and with int only 32ms. Less than half the time! Still of course, in this case that doesn't help much, but I didn't know this - I was assuming shorts were faster than ints!
Kristian D'Amato
+2  A: 

In addition to the other suggestions some have made, you may want to look to see if any of these calls to this function are done needlessly, and eliminate those call points. For instance, if you see that you are calling this repeatedly when there is no chance the result changed. The fastest code is that which never runs.

Michael
+1  A: 

Would it be possible to sort (or insert sorted) your data by status UNIT_ON_FRONT? That would make the function trivial.

Michael Dorgan
That's sort of what I've been thinking. If a unit has its status changed to UNIT_NOT_ON_FRONT, it is swapped with a unit that is on the front, ending up with all the active units in the beginning of the array. But then I'd also lose the order of the units, which I was using elsewhere...
Kristian D'Amato
If the order is needed, then perhaps a sort by the one used most often or maintain 2 sorted lists. Double's the input time, but again the fetch time is negligable.
Michael Dorgan
+1  A: 

How often will the status of a unit change? Perhaps you should keep a list of units which have the proper status, and only update that list when the status changes.

If necessary to minimize the cost of status changes, you could keep an array which says how many of the first 256 units have a particular status, how many of the next 256 units, etc. One could scan through the array 256 times as fast as one could scan through units until one was within 256 slots of the Nth "good" unit. Changing a unit's status would only require incrementing or decrementing one array slot.

Other approaches could be used to balance the cost of changing unit status with the cost of finding units, given various usage patterns.

supercat
Thanks, I will be spending much more time finding units than changing their status. In the worst case scenario, all units change their status once, but I will be looking for each particular unit an average of 100-200 times.
Kristian D'Amato
A: 

This shouldn't have a big impact, but you could check the assembly to see whether itsNumFightingUnits and itsNumUnits are loaded every loop iteration or if they are put into registers. If they are loaded every time, try adding temporaries at the beginning of the function.

softwarebug2.0
A: 

For that last bit of juice, and if the exception is thrown regularly, it might be worth switching to returning an error code. It's uglier code but the lack of stack jumps can be a big help. It's common in game development to turn off exceptions and RTTI.

tenpn
The exception should never be thrown, because the code takes care of that. Thanks for the tip though, I don't think I'll be using exceptions much except where performance is not an issue.
Kristian D'Amato
+1  A: 

One of the problems may be that this function may be called too often. Assuming the proportion of UNIT_ON_FRONT is constant, the complexity is linear. However, if you are calling the operator from a loop, that complexity is going rise to O(N^2).

If instead, you returned something like a boost::filter_iterator, you could improve the efficiency of those algorithms that need to iterate over UNIT_ON_FRONT.

Robert Tuck
Yeah, that's true. I think I'm going to be adding some status flag that indicates when there's been a change. That should eliminate quite a good number of iterations. Will have a look into using a filter_iterator, thanks.At the moment I'm trying it out with vector, popping and pushing. We'll see how it goes.
Kristian D'Amato
A: 

You're outsmarting yourself (which everyone does sometimes). You've made a simple problem O(N^2). Just think about what you've got to do before you go overloading operators.

Added in response to comment:

Try backing off to a simpler language, like C, or the C subset of C++. Forget about abstractions, collection classes, and all that hoo-haw. Look at what your program needs to do and think about your algorithm that way. Then, if you can simplify it by using container classes and overloading, without making it do any more work, then go for it. Most performance problems are caused by taking simple problems and making them complicated by trying to use all the fancy ideas.

For example, you're taking the [] operator, which is usually thought of as O(1), and making it O(N). Then I presume you use it in some O(N) loop, so you get O(N^2). What you really want to do is loop over the array elements that satisfy a certain condition. You could just do that. If there are very few of them, and you're doing this at really high frequency, you might want to build a separate list of them. But keep your data structure simple, simple, simple. It's better to "waste" cycles, and only optimize if you really have to.

Mike Dunlavey
+1  A: 

I have redesigned the solution completely, using two vectors, one for units on the front, and one for other units, and changed all algorithms such that a unit with a changed status is immediately moved from one vector to another. Thus I eliminated the counting in the [] operator which was the main bottleneck.

Before using the profiler I was getting computation times of around 5500 to 7000 ms. After looking at the answers here, 1) I changed the loop variables from ushort to int or uint, which reduced duration by ~10%, 2) I did another modification in a secondary algorithm to reduce the duration by a further 30% or so, 3) I implemented the two vectors as explained above. This helped reduce the computation time from ~3300 ms to ~700 ms, another 40%!

In all that's a reduction of 85 - 90%! Thanks to SO and the profiler.

Next I'm going to implement a mediator pattern and only call the updating function when required, perhaps oozing out a few more ms. :)

New code that corresponds to the old snippet (the functionality is completely different now):

UnitBase* Formation::operator[](ushort offset)
{
    if (offset < numFightingUnits)
        return unitFormation[offset]->getUnit();
    else
        return NULL;
}

Much shorter and more to the point. Of course, there were many other heavy modifications, most important being that unitFormation is now a std::vector<UnitFormationElement*> rather than simply a UnitBase**. The UnitFormationElement* contains the UnitBase* along with some other vital data that was hanging around in the Formation class before.

Kristian D'Amato
You should post the code. One of the interesting things about profiling is that when you make a change that improves things, the profiler will show you the next piece that may need addressed. That can sometimes trigger interesting ideas.
EvilTeach
Sure thing, just edited the above post. Yes, now the major consumer of electrons appears to be ntdll.dll and, funnily enough the following line: `if ((float) rand() / RAND_MAX <= chanceDefenseBreached`. Is it the cast or the rand() function?
Kristian D'Amato
@Kristian: OK, you're getting the idea, and as @EvilTeach says, keep doing it. You're getting a 10x speedup, and I bet you can get more, and in the process, you're *simplifying your data structure*.
Mike Dunlavey
@Kristian: You can answer whether it is the cast or the rand function by pausing it a number of times, and seeing how often it is in one versus the other. I'm guessing `rand`, but I might be wrong.
Mike Dunlavey
@Mike: How do you mean exactly Mike?
Kristian D'Amato
@Kristian: It's a little-taught but thoroughly effective way to answer performance questions. Basically, you pause the program at random times while it's running and examine its state. You don't have to do it very many times before you get a good idea of where the time goes, percentage-wise. Here's more of a discussion: http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343
Mike Dunlavey