ansaurus

Question

Is there a way to improve the speed or efficiency of this lookup? (C/C++)

Answer 1

+6 A:

Using some kind of string builder rather than repeated concatenation into 'key' would provide a significant speed boost.

Rob Walker 2009-11-09 22:07:59

In the STL this would probably be an `std::ostringstream`.

qid 2009-11-09 22:21:15

I'd avoid streams; they are slow to create and destroy. A char array of 12 characters would be sufficient.

Stephen Nutt 2009-11-10 00:09:04

std::basic_string (std::string is just a typedef of this class) has a function "reserve" which assures there is enough memory for specified number of characters to avoid later allocations and copying. In this case the number of characters is known fixed so you could even use resize and then instead of adding characters with += use indexing directly.

Adam Badura 2009-11-10 07:16:16

Answer 2

+6 A:

You may want to reserve memory in advance for your string key. This may get you a decent performance gain, as well as a gain in memory utilization. Whenever you call the append operator on std::string, it may double the size of the internal buffer if it has to reallocate. This means each string may be taking up significantly more memory than is necessary to store the characters. You can avoid this by reserving memory for the string in advance.

Charles Salvia 2009-11-09 22:08:44

I actually forgot to mention that I've already reserved space for the string. Thanks for the suggestion though.

GenTiradentes 2009-11-09 22:51:38

Answer 3

+1 A:

You don't need to copy input into num, because you pass it by value. You can also compute the length of charset in compiletime, there's no need to compute it in runtime every single time you call the function.

But these are very minor performance issues. I think the the most significant help you can gain is by avoiding the string concatenation in the loop. When you construct the key string pass the string constructor the length of your result string so that there is only one allocation for the string. Then in the loop when you concatenate into the string you will not re-allocate.

You can make things even slightly more efficient if you take the target string as a reference parameter or even as two iterators like the standard algorithms do. But that is arguably a step too far.

By the way, what if the value passed in for input is zero? You won't even enter the loop; shouldn't key then be "0"?

I see the value passed in for input can't be negative, but just so we note: the C remainder operator isn't a modulo operator.

wilhelmtell 2009-11-09 22:25:38

Answer 4

+5 A:

I agree with Rob Walker - you're concentrating on improving performance in the wrong area. The string is the slowest part.

I timed the code (your original is broken, btw) and your original (when fixed) was 44982140 cycles for 100000 lookups and the following code is about 13113670.

const char* charset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
#define CHARSET_LENGTH 62

// maximum size = 11 chars
void integerToKey(char result[13], unsigned long long input)
{
    char* p = result;
    while(input > 0)
    {
     *p++ = charset[input % CHARSET_LENGTH];
     input /= CHARSET_LENGTH;
    }

    // null termination
    *p = '\0';
    // need to reverse the output
    char* o = result;
    while(o + 1 < p)
     swap(*++o, *--p);
}

Aaron 2009-11-09 22:26:45

This helps a lot, thank you.

GenTiradentes 2009-11-09 22:57:02

If performance is really critical, the only change I'd made to this is to eliminate the % operator. This can be very slow depending upon your CPU for a 64 bit number. It would be substantially faster to perform the division and subtract the new value of input from the old value of input to calculate the index in charset. This could speed things up over 30%

Stephen Nutt 2009-11-10 00:14:02

@Stephen Nutt - Did you actually time that? The x86 can do a single instruction divide with remainder of a 64-bit number. I would expect the compiler to just use that.

Aaron 2009-11-10 02:02:46

Answer 5

+2 A:

This is almost a textbook case of how not to do this. Concatenating strings in a loop is a bad idea, both because appending isn't particularly fast, and because you're constantly allocating memory.

Note: your question states that you're converting to base-62, but the code seems to have 63 symbols. Which are you trying to do?

Given a 64-bit integer, you can calculate that you won't need any more than 11 digits in the result, so using a static 12 character buffer will certainly help improve your speed. On the other hand, it's likely that your C++ library has a long-long equivalent to ultoa, which will be pretty optimal.

Edit: Here's something I whipped up. It allows you to specify any desired base as well:

std::string ullToString(unsigned long long v, int base = 64) {
    assert(base < 65);
    assert(base > 1);
    static const char digits[]="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/";
    const int max_length=65;
    static char buffer[max_length];

    buffer[max_length-1]=0;
    char *d = buffer + max_length-1;
    do {
     d--;
     int remainder = v % base;
     v /= base;
     *d = digits[remainder];
    } while(v>0);

    return d;
}

This only creates one std::string object, and doesn't move memory around unnecessarily. It currently doesn't zero-pad the output, but it's trivial to change it to do that to however many digits of output you want.

Mark Bessey 2009-11-09 22:30:26

ultoa() would be great exept the radix is limited to 2 - 36.

Aaron 2009-11-09 22:38:17

I know I've seen versions that accepted higher bases, as an extension. But it shouldn't be all that hard to write, anyway.

Mark Bessey 2009-11-09 22:59:18

Answer 6

+1 A:

Why not just use a base64 library? Is really important that 63 equals '11' and not a longer string?

size_t base64_encode(char* outbuffer, size_t maxoutbuflen, const char* inbuffer, size_t inbuflen);

std::string integerToKey(unsigned long long input) {
    char buffer[14];
    size_t len = base64_encode(buffer, sizeof buffer, (const char*)&input, sizeof input);
    return std::string(buffer, len);
}

Yes, every string will end with an equal size. If you don't want it to, strip off the equal sign. (Just remember to add it back if you need to decode the number.)

Of course, my real question is why are you turning a fixed width 8byte value and not using it directly as your "key" instead of the variable length string value?

Footnote: I'm well aware of the endian issues with this. He didn't say what the key will be used for and so I assume it isn't being used in network communications between machines of disparate endian-ness.

jmucchiello 2009-11-09 22:48:59

Answer 7

+1 A:

If you could add two more symbols so that it is converting to base-64, your modulus and division operations would turn into a bit mask and shift. Much faster than a division.

Zan Lynx 2009-11-10 14:53:13

Answer 8

+1 A:

If all you need is a short string key, converting to base-64 numbers would speed up things a lot, since div/mod 64 is very cheap (shift/mask).

mfx 2009-11-19 16:51:34

ansaurus

tags:

views:

answers:

Is there a way to improve the speed or efficiency of this lookup? (C/C++)

related questions