views:

856

answers:

10

Hi, I'm wanting to convert the output from gethrtime to milliseconds.

The obvious way to do this is to divide by 1000000. However, I'm doing this quite often and wonder if it could become a bottleneck.

Is there an optimized divide operation when dealing with numbers like 1000000?

Note: Any code must be portable. I'm using gcc and this is generally on Sparc hardware

Some quick testing using the code below... hope that is right.

#include <sys/time.h>
#include <iostream>

using namespace std;

const double NANOSECONDS_TO_MILLISECONDS = 1.0 / 1000000.0;

int main()
{
    hrtime_t start;
    hrtime_t tmp;
    hrtime_t fin;

    start = gethrtime();
    tmp = (hrtime_t)(start * NANOSECONDS_TO_MILLISECONDS);
    fin = gethrtime();

    cout << "Method 1"
    cout << "Original val: " << start << endl;
    cout << "Computed: " << tmp << endl;
    cout << "Time:" << fin - start << endl;

    start = gethrtime();
    tmp = (start / 1000000);
    fin = gethrtime();

    cout "Method 2"    
    cout << "Original val: " << start << endl;
    cout << "Computed: " << tmp << endl;
    cout << "Time:" << fin - start << endl;

    return 0;
}

Example outputs:

Original val: 3048161553965997
Computed: 3048161553
Time:82082
Original val: 3048161556359586
Computed: 3048161556
Time:31230

Original val: 3048239663018915
Computed: 3048239663
Time:79381
Original val: 3048239665393873
Computed: 3048239665
Time:31321

Original val: 3048249874282285
Computed: 3048249874
Time:81812
Original val: 3048249876664084
Computed: 3048249876
Time:34830

If this is correct, then the multiple by reciprocal is actually slower in this case. It's probably due to using floating point math instead of fixed point math. I will just stick to integer division then which still takes hardly any time at all.

+2  A: 

Multiply by 1/1,000,000. It should be faster. My Google search was saying to speed up divisions, multiply be the reciprocal. So I would pre-calculate the reciprocal or a list of reciprocals if there is a relatively known set of possible values, and then multiply.

Jacob

TheJacobTaylor
I don't think he is using floating points so he cannot use reciprocals. 1/1,000,000 would be 0.
Unknown
Converting an integer divide operation into a floating point multiply operation would probably not be faster, and introduce inaccuracy of floating point math.
scwagner
Dividing by huge numbers with integer math just feels like asking for pain. I get back to the "why" question above. Anyways, if you cannot convert the value on "display" instead of having to do it in realtime, an approximation might work. Just call it Milliseconds (notice capital M like MegaBytes). Bit shift the integer to the left by 20 to divide by 1,048,576.
TheJacobTaylor
Quick google shows you are right....both return an hrtime_t, which is a 64-bit (long long) signed integer. (from Solaris docs) That even makes the bit shift approximation not so cool.
TheJacobTaylor
You can do reciprocals using integer math. The GCC assembly shows how. The basic trick is not to multiply by 1/N, but by (x/2^y). The "/2^y" part is fast because it's merely a bitshift.
MSalters
+18  A: 

Division is not an expensive operation. I doubt very much if a divide-by-1000000 operation will be anywhere near the main bottleneck in your application. Floating-point processors will be way faster than any sort of "tricks" you can come up with than just doing the single operation.

Robert Cartaino
I think you're right. Let the compiler make such decisions.
Matt H
Outputing the result will me much more expensive as it is very likely to imply a system call (write()).
Ben
Agreed. Premature optimization is the root of all evil. And until you've demonstrated by measurement that one division is the source of your application's performance woes, don't bother with the optimization.
Jonathan Leffler
I could however imagine a situation where this could be a performance bottleneck. If your measuring a certain high frequency input and you'll have to register the time yourself. If you'd be using the QueryPerformanceCounter, you could want to translate that to a correct value as fast as possible.
Davy Landman
+2  A: 
e.James
This actually did not compute the correct result. Also, I would be wanting to avoid function call overhead here... I'll stick with integer division for now.
Matt H
Hmm, that's strange. `4295 >> 32` works out to (approximately) 10^-6. Were you using the difference (which should be small enough to avoid the risk of overflow) or the full amount? Either way, you probably have the right idea just using integer division :)
e.James
A: 

It's possible to transform integer division into a series of simpler operations. A generic method to do that popularized by Terje Mathisen is outlined on page 136 of Optimizing subroutines in assembly language. If you know in advance the width of your data types and what you're dividing by, that will lead you through how to turn that into a serious of simpler operations that in theory might be faster than a more generic division operation that has to handle any divisor. There will still be some platform issues to be concerned about if you're worried about differently sized integers on some of them.

Unless you're actually programming this in assembly language, I'd bet against you actually improving anything in the process over the SPARC divide implementation. Maybe if you're using a positively ancient SPARC V7 processor, from before division was implemented in hardware, you might get some improvement, but even then I'd bet on the built-in division being faster.

Regardless, I suspect you've launched into a bit of premature optimization here though. You should start here by profiling the app you've got before presuming this division has any significant impact on its runtime, and you should similarly profile any change to the division to prove it works as expected. It's quite easy to get code that you think will execute faster but actually doesn't nowadays, given how complicated things like CPU caches have gotten.

Greg Smith
+5  A: 

I'm surprised nobody has gotten this yet…

  • division is the same as multiplication by a fraction
  • multiplying by a fractional power of 2 is fast: just bit-shift
  • integral division involves rounding down
  • rounding down is like multiplying by a slightly smaller fraction (up to a certain point, you need to be aware of your ranges)

So,

const uint64_t numerator = (1LL<<32)/1000000;

...

millionths = ( number * numerator ) >> 32;

Supah fast!

Potatoswatter
I thought for fun I'd try this. Unfortunately gcc spits out the following error: "warning: left shift count >= width of type" therefore the computed value is 0. I'll stick with integer divide since it's not that slow.
Matt H
Sorry, it needs to be 1LL instead of just 1, so the literal isn't a 32-bit int.
Potatoswatter
I think the compiler in Josh Haberman's answer did basically the same thing.
ilya n.
A: 

If you can get around with that, here's my solution.

  • use integers instead of floats (they are faster)
  • divide by 1048576 by shifting bits to the right (that is cheaper than anything on floats)

and convince yourself that milliseconds should be base2 and not base10. ;-)

Marcin
x>>20 is a rough approximation, x>>20 + x>>24 is a lot closer. +x>>25 gets you another digit.
MSalters
+11  A: 

Let your compiler figure it out!

Seriously, if you're really concerned about optimizations at this level (and you shouldn't be unless it shows up in a profile), you should get used to looking at your compiler's assembly language output. You will be amazed what the compiler is doing on your behalf.

All the people who are recommending math tricks either have bad compilers or they are underestimating their compilers. For example, try compiling this function:

unsigned long div1000000(unsigned long n) {
  return n / 1000000UL;
}

Compiled with gcc 4.3.3 on x86 (-O3, -fomit-frame-pointer), I get:

$ objdump -d div.o -M intel

test2.o:     file format elf32-i386


Disassembly of section .text:

00000000 <div1000000>:
   0:   b8 83 de 1b 43        mov    eax,0x431bde83
   5:   f7 64 24 04           mul    DWORD PTR [esp+0x4]
   9:   c1 ea 12              shr    edx,0x12
   c:   89 d0                 mov    eax,edx
   e:   c3                    ret

In other words, the compiler took n / 1000000UL and turned it into (unsigned long long)(n * 0x431bde83) >> (0x12 + 32). Why does that work? Off the top of my head, I have no idea! But the compiler decided that it would be faster than issuing a native divide.

Moral of the story:

  • don't optimize this unless you're sure it's a bottleneck.
  • don't do fancy arithmetic (multiplying by the reciprocal, shifts, etc) unless you already know what your compiler is doing and you think you can beat it.
  • benchmark the result -- only leave in a wart like fancy bitmath if you have demonstrated that you've outdone your compiler.
Josh Haberman
A small nitpick - the operation is more like: ((long long)(n * 0x431bde83)) >> (18 + 32), since the shift is 18 bits (0x12) on the high 32-bits of the 64-bit multiplication result.
Michael Burr
Thanks for the correction, I thought I might have wrote that down too fast. Though I made one small correction to your correction -- the cast would be to "unsigned long long" since the shift is logical, not arithmetic.
Josh Haberman
Why does this work? n / 1000000 = n * (1/1000000) = n * 1125899907 / (2^50) = (n * 1125899907) >> 50.
MSalters
+1  A: 

As Joshua Haberman mentioned, your compiler will probably already convert the division by a constant 1000000 to a multiply by a 'magic number' followed by a shift (if the division is an integer operation). You can get more details of what's going on in Henry Warren's "Hacker's Delight" book and at the companion website:

He even has a page that has a Javascript calculator for the magic numbers:

Michael Burr
A: 

First, the obvious disclaimer: Unless you perform the division a couple of million times per second at least, it won't be a bottleneck, and you should just leave it. Premature optimization and all that.

Second, how accurate do you need the result to be? A handy rule of thumb for converting between binary and decimal is that 2^10 ~= 10^3.

In other words, a million is roughly equal to 2^20. So you could just right shift 20. The compiler won't do this for you automatically, of course, because it changes the result. But if you're willing to live with the slight accuracy, and the division is actually a real performance problem, this would be my suggestion.

jalf
A: 

1/1000000 is 0.000000000000000000 0100 0011 0001 1011 1101 1110 1000 0010 1101 0111 1011 0110 0011 01 binary - which is 0x431BDE82 * 2^-18

Therefore n/1000000 is equivalent to (n*0x431BDE82)>>18

Also n/1000000 is equivalent to (n*0x8637BD04)>>19

Note that this is a "fixed point" calculation and you should be aware that precision could be lost.

teambob