views:

347

answers:

6

Hello,

I'm writing a game engine and I need a way to get a precise and accurate "deltatime" value from which to derive the current FPS for debug and also to limit the framerate (this is important for our project).

Doing a bit of research, I found out one of the best ways to do this is to use WinAPI's QueryPerformanceCounter function. GetTicksCount has to be used to prevent forward counter leaps, but it in itself is not very accurate.

Now, the problem with QueryPerformanceCounter is that it apparently may return values that would look like if time warped back (i.e. a call may return a value prior in time relative to another call in the past). This happens only when a value obtained with a given processor core is compared against a value obtained with another processor core, which leads me to the ultimate questions that motivated me to make this post:

  1. May the OS "reallocate" a thread to another core while the thread is already running, or is a thread is allocated to a given core and that's that until the thread dies?
  2. If a thread can't be reallocated (and that makes a lot of sense for me, at least), then why is it possible for me to do something like SetThreadAffinityMask(GetCurrentThread(),mask)? Ogre3D does that in its Ogre::Timer class (Windows implementation), and I'm assuming that's to avoid time going back. But for that to be true, then I would have to consider the possibility of threads being moved from one core to another arbitrarily by the OS, which seems rather odd to me (not sure why).

I think that was all I wanted to know for now. Thanks.

+1  A: 

Threads can be, and are (unless they have an affinity set) reallocated while the thread is running. Windows spreads the load over all the processors to maximize performance.

Jon Benedicto
So, from what you can tell, not setting an affinity mask to just one core and using QueryFrequencyCounter would indeed make the counter warp back when the function gets called from different cores. Is that so?
n2liquid
There is the possibility that the time might go backwards when switching cores, yes.
Jon Benedicto
Ok, I think that settles it. I'll have to create a thread to measure time and have mutexes be used to query it so the measurement always occurs in only one thread, correct?
n2liquid
Something like that. Don't forget that mutexes/critical sections involve switching from user mode to kernel mode so they have a speed penalty.
Jon Benedicto
A: 

1) A thread may allocate a thread to whichever core has spare processing time. This will be why you will often see software using 50% on a quad core machine yet when you check the graphs its using half of all four.

2) See 1 ;)

Goz
A: 

Using SetThreadAffinity() is usually a bad idea except in the case where the thread only does timing. If you lock your thread to a single core, you remove all the benefit of having a multicore system in the first place. Your application can no longer scale. Even if you launch multiple instances of your app, they will still be locked to a single core.

John Dibling
Why would someone launch multiple instances of a game?
Crashworks
Who knows. Maybe pieces of the app are broken out in to separate apps. More importantly, what does it matter?
John Dibling
@Crashworks Nobody would, but he really has a point. That was why I just didn't go on setting up affinities without first asking, too. I also thought that should only be done for timing. Good to hear somebody who thought like me.
n2liquid
Because they want to do something with multiple instances of a game? It's not unheard of, say, when testing a networking game locally for whatever reason.
MSN
FWIW, our experience has been that the various components of our engine were so timing-sensitive that we had to set explicit affinities for each thread job. We knew up front which parts of the frame could run concurrently and upon what they depended, and allowing the operating system to move threads from one core to another only caused problems either due to mis-scheduled concurrency, or many wasted microseconds on context switches.
Crashworks
@Crashworks That's really interesting to hear, also. I will consider that possibility when designing my engine, thanks. +1
n2liquid
+1  A: 

Unless a thread has a processor affinity mask, the scheduler will move it from processor to processor in order to give it execution time. Since moving a thread between processors costs performance, it will try not to move it, but giving it a processor to execute on has priority over not moving it. So, usually threads move.

As for timer apis. timeGetTime is designed for multimedia timing, so it's a bit more accurate than GetTickCount.

QueryPerformanceCounter(). is still your most precise measurement though. Microsoft has this to say about it.

On a multiprocessor computer, it should not matter which processor is called. However, you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL). To specify processor affinity for a thread, use the SetThreadAffinityMask function.

So if you are doing the timing tests on a specific computer, you may not have to worry about QPC going backwards, you should do some testing and see if it matters on your machine.

John Knoeller
Good to know. I was really wondering how many methods there were to do such measurements. Are there indeed only those 3 well-known portable methods?
n2liquid
As far as I know, only those 3. There is a realtime clock tick, but it's hard to get to outside of kernel mode. and it's only 1.1 Mhz tick rate.
John Knoeller
Regarding doing timing tests on a specific computer, like I said, this is **not** for performance test. It is an important and integrant part of the final product. But thanks for the comment, though.
n2liquid
+1  A: 

Even if you lock the thread to one processor using SetAffinityMask, QPC can run backwards if you're really unlucky and the hardware sucks. Better to just deal with the case of QPC returning bad values. In Windows 7, QPC has been significantly improved in this regard, but since you're writing a game you're probably targeting XP where it won't help you.

Also, don't set the thread affinity, you can deadlock yourself, introduce weird timing and perf bugs, and generally cause yourself grief.

Paul Betts
So what your suggestion would be? Whenever I detect a backward run, ignore the event and use the last deltatime? I could only do that if I were sure it doesn't happen all the time. Do you have any other ideas?
n2liquid
That wouldn't be a bad approach, or keep a running average of the last 4 frames and use that instead if you get garbage. In general though, you're not going to see QPC do this to you all the time, it's a fairly rare occurrence - it's just the dumb game code that does the unsigned subtract and underflows that is the problem
Paul Betts
A: 

We typically have to lock our game into a single thread when running timings because of this; there's no effective way around that we've found since you need submicrosecond resolution when measuring perf.

One thing that makes it a little easier is that our engine is cut up into broad components that always run concurrently (eg game/logic "server", input/graphics "client", audio, render are each their own thread), so what we do is lock each of those threads onto its own core and time them independently.

Similarly, because we know that eg the render loop is always going to be on core 0, we use that for timing framerate.

Crashworks