tags:

views:

131

answers:

4

Here is what I'm doing. My application takes points from the user while dragging and in real time displays a filled polygon.

It basically adds the mouse position on MouseMove. This point is a USERPOINT and has bezier handles because eventually I will do bezier and this is why I must transfer them into a vector.

So basically MousePos -> USERPOINT. USERPOINT gets added to a std::vector<USERPOINT> . Then in my UpdateShape() function, I do this:

DrawingPoints is defined like this:

std::vector<std::vector<GLdouble>> DrawingPoints;


Contour[i].DrawingPoints.clear();



 for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x)
         SetCubicBezier(
             Contour[i].UserPoints[x],
             Contour[i].UserPoints[x + 1],
             i);

SetCubicBezier() currently looks like this:

void OGLSHAPE::SetCubicBezier(USERFPOINT &a,USERFPOINT &b, int &currentcontour )
{
std::vector<GLdouble> temp(2);

    if(a.RightHandle.x == a.UserPoint.x && a.RightHandle.y == a.UserPoint.y 
        && b.LeftHandle.x == b.UserPoint.x && b.LeftHandle.y == b.UserPoint.y )
    {

        temp[0] = (GLdouble)a.UserPoint.x;
        temp[1] = (GLdouble)a.UserPoint.y;

        Contour[currentcontour].DrawingPoints.push_back(temp);

        temp[0] = (GLdouble)b.UserPoint.x;
        temp[1] = (GLdouble)b.UserPoint.y;


        Contour[currentcontour].DrawingPoints.push_back(temp);

    }
    else
    {
         //do cubic bezier calculation
        }

So for the reason of cubic bezier, I need to make USERPOINTS into GlDouble[2] (since GLUTesselator takes in a static array of double.

So I did some profiling. At ~ 100 points, the code:

 for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x)
         SetCubicBezier(
             Contour[i].UserPoints[x],
             Contour[i].UserPoints[x + 1],
             i);

Took 0 ms to execute. then around 120, it jumps to 16ms and never looks back. I'm positive this is due to std::vector. What can I do to make it stay at 0ms. I don't mind using lots of memory while generating the shape then removing the excess when the shape is finalized, or something like this.

+13  A: 

0ms is no time...nothing executes in no time. This should be your first indicator that you might want to check your timing methods over timing results.

Namely, timers typically don't have good resolution. Your pre-16ms results are probably just actually 1ms - 15ms being incorrectly reported at 0ms. In any case, if we could tell you how to keep it at 0ms, we'd be rich and famous.

Instead, find out which parts of the loop take the longest, and optimize those. Don't work towards an arbitrary time measure. I'd recommend getting a good profiler to get accurate results. Then you don't need to guess what's slow (something in the loop), but can actually see what part is slow.

GMan
It has to do with the std::vector reallocating...
Milo
@Jex: Or your algorithm causing too much allocation. There's more than one way to fix these things. I'm going to make my answer CW since it's admittedly more about timing and less about optimizing.
GMan
"if we could tell you how to keep it at 0ms, we'd be rich and famous" - liked that very much.
anon
@Jex Or you not knowing what you are dong - could that be even vaguely possible? Searching for something other than themselves to blame is not what a good craftsman does.
anon
The Windows timer (used by `GetTickCount`) has a resolution of only 15-16 ms, which would explain the results.
dan04
+1  A: 

You could use vector::reserve() to avoid unnecessary reallocations in DrawingPoints:

Contour[i].DrawingPoints.reserve(Contour[i].size());    
for(unsigned int x = 0; x < Contour[i].UserPoints.size() - 1; ++x) {
   ...
}
sth
This did not help :-( its still slow
Milo
+1  A: 

If you actually timed the second code snippet only (as you stated in your post), then you're probably just reading from the vector. This means, the cause can not be the re-allocation cost of the vector. In that case, it may due to cache issues of the CPU (i.e. the small datasets can be read in lightning speed from cpu cache, but whenever the dataset is larger than the cache [or when alternately reading from different memory locations], the cpu has to access ram, which is distinctly slower than cache access).

If the part of the code, which you profiled, appends data to the vector, then use std::vector::reserve() with an appropriate capacity (number of expected entries in vector) before filling it.

However, regard two general rules for profiling/benchmarking:

1) Use time measurement methods with high resolution precision (as others stated, the resolution of your timer IS too low)

2) In any case, run the code snippet more than once (e.g. 100 times), get the total time of all runs and divide it by number of runs. This will give you some REAL numbers.

frunsi
Ok ill try 100 tiimes
Milo
100 times is not enough. Try at least 10,000 times, and if that's still imperceptibly fast, try a million times.
dan04
I did 50 times and: ( / 50)100 points: 2.36,150: 4.36,250: 9.4
Milo
@Jex: lets see, 250x2xsizeof(double)=4000 bytes, thats below a typical cache line size, but may still be too large. a detailed analysis would exceed space and time here ;-). but you may try using a single vector, instead of one vector per contour (this improves locality and makes it easier for the cpu to manage ram accesses)
frunsi
@frunsi: What platform has a 4,000 byte cache line?
James McNellis
@james: the future shiny new platform!? no, my fault, i meant cache size (not cache line size).
frunsi
A: 

There's a lot of guessing going on here. Good guesses, I imagine, but guesses nevertheless. And when you try to measure the time functions take, that doesn't tell you how they take it. You can see if you try different things that the time will change, and from that you can have some suggestion of what was taking the time, but you can't really be certain.

If you really want to know what's taking the time, you need to catch it when it's taking that time, and find out for certain what it's doing. One way is to single-step it at the instruction level through that code, but I suspect that's out of the question. The next best way is to get stack samples. You can find profilers that are based on stack samples. Personally, I rely on the manual technique, for the reasons given here.

Notice that it's not really about measuring time. It's about finding out why that extra time is being spent, which is a very different question.

Mike Dunlavey