views:

447

answers:

10

I have a method where performance is really important (I know premature optimization is the root of all evil. I know I should and I did profile my code. In this application every tenth of a second I save is a big win.) This method uses different heuristics to generate and return elements. The heuristics are used sequentially: the first heuristic is used until it can no longer return elements, then the second heuristic is used until it can no longer return elements and so on until all heuristics have been used. On each call of the method I use a switch to move to the right heuristic. This is ugly, but work well. Here is some pseudo code

class MyClass
{
private:
   unsigned int m_step;
public:
   MyClass() : m_step(0) {};

   Elem GetElem()
   {
      // This switch statement will be optimized as a jump table by the compiler.
      // Note that there is no break statments between the cases.
      switch (m_step)
      {
      case 0:
         if (UseHeuristic1())
         {
            m_step = 1; // Heuristic one is special it will never provide more than one element.
            return theElem;
         }

         m_step = 1;

      case 1:
         DoSomeOneTimeInitialisationForHeuristic2();
         m_step = 2;

      case 2:
         if (UseHeuristic2())
         {
            return theElem;
         }

         m_step = 3;

      case 3:
         if (UseHeuristic3())
         {
            return theElem;
         }
         m_step = 4; // But the method should not be called again
      }

      return someErrorCode;
   };
}

As I said, this works and it's efficient since at each call, the execution jumps right where it should. If a heuristic can't provide an element, m_step is incremented (so the next time we don't try this heuristic again) and because there is no break statement, the next heuristic is tried. Also note that some steps (like step 1) never return an element, but are one time initialization for the next heuristic.

The reason initializations are not all done upfront is that they might never be needed. It is always possible (and common) for GetElem to not get called again after it returned an element, even if there are still elements it could return.

While this is an efficient implementation, I find it really ugly. The case statement is a hack; using it without break is also hackish; the method gets really long, even if each heuristic is encapsulated in its own method.

How should I refactor this code so it's more readable and elegant while keeping it as efficient as possible?

+4  A: 

Wrap each heuristic in an iterator. Initialize it completely on the first call to hasNext(). Then collect all iterators in a list and use a super-iterator to iterate through all of them:

boolean hasNext () {
    if (list.isEmpty()) return false;

    if (list.get(0).hasNext()) return true;

    while (!list.isEmpty()) {
        list.remove (0);
        if (list.get(0).hasNext()) return true;
    }
    return false;
}
Object next () {
    return list.get (0).next ();
}

Note: In this case, a linked list might be a tiny bit faster than an ArrayList but you should still check this.

[EDIT] Changed "turn each" into "wrap each" to make my intentions more clear.

Aaron Digulla
Also ... I would create several implementations and test their performance in practice. What language is the asker using?
Hamish Grubijan
He's using c++ code.
Brian Young
I undestand you are saying that each Heuristic *must be a diferent object* implementing the same interface (or inheriting the same abstract class) and *put them* into an iterator for choosing the good one. That's a nice solution (polymorphism)
helios
If you go this route, and you decide to make actual C++ iterators, see http://stackoverflow.com/questions/981186/chain-iterator-for-c for a few suggestions.
Jason Orendorff
+2  A: 

It looks like there really isn't much to optimize in this code - probably most of the optimization can be done in the UseHeuristic functions. What's in them?

Larry Watanabe
I think the object is not to optimize this code, but to make it more readable without negatively impacting performance.
Mark Ransom
+2  A: 

To my mind if you do not need to modify this code much, eg to add new heuristics then document it well and don't touch it.

However if new heuristics are added and removed and you think that this is an error prone process then you should consider refactoring it. The obvious choice for this would be to introduce the State design pattern. This will replace your switch statement with polymorphism which might slow things down but you would have to profile both to be sure.

iain
+2  A: 

You can turn the control flow inside-out.

template <class Callback>  // a callback that returns true when it's done
void Walk(Callback fn)
{
    if (UseHeuristic1()) {
        if (fn(theElem))
            return;
    }
    DoSomeOneTimeInitialisationForHeuristic2();
    while (UseHeuristic2()) {
        if (fn(theElem))
            return;
    }
    while (UseHeuristic3()) {
        if (fn(theElem))
            return;
    }
}

This might earn you a few nanoseconds if the switch dispatch and the return statements are throwing the CPU off its stride, and the recipient is inlineable.

Of course, this kind of optimization is futile if the heuristics themselves are nontrivial. And much depends on what the caller looks like.

Jason Orendorff
Note that once you decide to go this route, you can do the same to each heuristic, and then this becomes even more straightforward: `bool Walk(Callback fn) { return WalkHeuristic1(fn) || WalkHeuristic2(fn) || WalkHeuristic3(fn); }`.
Jason Orendorff
+1  A: 

That's micro optimization, but there is no need to set m_elem value when you are not returning from GetElem. See code below.

Larger optimization definitely need simplifying control flow (less jumps, less returns, less tests, less function calls), because as soon as a jump is done processor cache are emptied (well some processors have branch prediction, but it's no silver bullet). You can give a try at solutions proposed by Aaron or Jason, and there is others (for instance you can implement several get_elem functions annd call them through a function pointer, but I'm quite sure it'll be slower).

If the problem allow it, it can also be efficient to compute several elements at once in heuristics and use some cache, or to make it truly parallel with some thread computing elements and this one merely a customer waiting for results... no way to say more without some details on the context.

class MyClass
{
private:
   unsigned int m_step;
public:
   MyClass() : m_step(0) {};

   Elem GetElem()
   {
      // This switch statement will be optimized as a jump table by the compiler.
      // Note that there is no break statments between the cases.
      switch (m_step)
      {
      case 0:
         if (UseHeuristic1())
         {
            m_step = 1; // Heuristic one is special it will never provide more than one element.
            return theElem;
         }

      case 1:
         DoSomeOneTimeInitialisationForHeuristic2();
         m_step = 2;

      case 2:
         if (UseHeuristic2())
         {
            return theElem;
         }

      case 3:
         m_step = 4;

      case 4:
         if (UseHeuristic3())
         {
            return theElem;
         }
         m_step = 5; // But the method should not be called again
      }

      return someErrorCode;
   };
}
kriss
+1  A: 

What you really can do here is replacing conditional with State pattern.

http://en.wikipedia.org/wiki/State%5Fpattern

May be it would be less performant because of the virtual method call, maybe it would be better performant because of less state maintaining code, but the code would be definitely much clearer and maintainable, as always with patterns.

What could improve performance, is elimination of DoSomeOneTimeInitialisationForHeuristic2(); with separate state between. 1 and 2.

George Polevoy
A: 

If it ain't broke don't fix it.

It looks pretty efficient as is. It doesn't look hard to understand either. Adding iterators etc. is probably going to make it harder to understand.

You are probably better off doing

  1. Performance analysis. Is time really spent in this procedure at all, or is most of it in the functions that it calls? I can't see any significant time being spent here.
  2. More unit tests, to prevent someone from breaking it if they have to modify it.
  3. Additional comments in the code.
Larry Watanabe
I don't know why you have been down voted. While I won't accept this answer, it's still informative. Thanks for taking the time to answer.
Mathieu Pagé
+3  A: 

I don't think your code is so bad, but if you're doing this kind of thing a lot, and you want to hide the mechanisms so that the logic is clearer, you could look at Simon Tatham's coroutine macros. They're intended for C (using static variables) rather than C++ (using member variables), but it's trivial to change that.

The result should look something like this:

Elem GetElem()
{
  crBegin;

  if (UseHeuristic1())
  {
     crReturn(theElem);
  }

  DoSomeOneTimeInitialisationForHeuristic2();

  while (UseHeuristic2())
  {
     crReturn(theElem);
  }

  while (UseHeuristic3())
  {
     crReturn(theElem);
  }

  crFinish;
  return someErrorCode;
}
Steve Jessop
+1  A: 

Since each heuristic is represented by a function with an identical signature, you can make a table of function pointers and walk through it.

class MyClass 
{ 
private: 
   typedef bool heuristic_function();
   typedef heuristic_function * heuristic_function_ptr;
   static heuristic_function_ptr heuristic_table[4];
   unsigned int m_step; 
public: 
   MyClass() : m_step(0) {}; 

   Elem GetElem() 
   { 
      while (m_step < sizeof(heuristic_table)/sizeof(heuristic_table[0]))
      {
         if (heuristic_table[m_step]())
         {
            return theElem;
         }
         ++m_step;
      }

      return someErrorCode; 
   }; 
}; 

MyClass::heuristic_function_ptr MyClass::heuristic_table[4] = { UseHeuristic1, DoSomeOneTimeInitialisationForHeuristic2, UseHeuristic2, UseHeuristic3 };
Mark Ransom
+1  A: 

If the element code you are processing can be converted to an integral value, then you can construct a table of function pointers and index based on the element. The table would have one entry for each 'handled' element, and one for each known but unhandled element. For unknown elements, do a quick check before indexing the function pointer table.

Calling the element-processing function is fast.

Here's working sample code:

#include <cstdlib>
#include <iostream>
using namespace std;

typedef void (*ElementHandlerFn)(void);

void ProcessElement0()
{
    cout << "Element 0" << endl;
}

void ProcessElement1()
{
    cout << "Element 1" << endl;
}
void ProcessElement2()
{
    cout << "Element 2" << endl;
}

void ProcessElement3()
{
    cout << "Element 3" << endl;
}

void ProcessElement7()
{
    cout << "Element 7" << endl;
}

void ProcessUnhandledElement()
{
    cout << "> Unhandled Element <" << endl;
}




int main()
{
    // construct a table of function pointers, one for each possible element (even unhandled elements)
    // note: i am assuming that there are 10 possible elements -- 0, 1, 2 ... 9 --
    // and that 5 of them (0, 1, 2, 3, 7) are 'handled'.

    static const size_t MaxElement = 9;
    ElementHandlerFn handlers[] = 
    {
        ProcessElement0,
        ProcessElement1,
        ProcessElement2,
        ProcessElement3,
        ProcessUnhandledElement,
        ProcessUnhandledElement,
        ProcessUnhandledElement,
        ProcessElement7,
        ProcessUnhandledElement,
        ProcessUnhandledElement
    };

    // mock up some elements to simulate input, including 'invalid' elements like 12
    int testElements [] = {0, 1, 2, 3, 7, 4, 9, 12, 3, 3, 2, 7, 8 };
    size_t numTestElements = sizeof(testElements)/sizeof(testElements[0]);

    // process each test element
    for( size_t ix = 0; ix < numTestElements; ++ix )
    {
        // for some robustness...
        if( testElements[ix] > MaxElement )
            cout << "Invalid Input!" << endl;
        // otherwise process normally
        else
            handlers[testElements[ix]]();

    }

    return 0;
}
John Dibling
+1 IMHO this one (and Mark Ransoms answer) are the only real refactoring examples here (procedural -> OOP).
Groo