views:

5583

answers:

7

I have 3 base representations for positive integer numbers:

  1. Decimal, in unsigned long variable (e.g. unsigned long int NumDec = 200).
  2. Hex, in string variable (e.g. string NumHex = "C8")
  3. Binary, in string variable (e.g. string NumBin = "11001000")

I want to be able to convert between numbers in all 3 representations in the most efficient way. I.e. to implement the following 6 functions:

unsigned long int Binary2Dec(const string & Bin) {}
unsigned long int Hex2Dec(const string & Hex) {}
string Dec2Hex(unsigned long int Dec) {}
string Binary2Hex(const string & Bin) {}
string Dec2Binary(unsigned long int Dec) {}
string Hex2Binary(const string & Hex) {}

What is the most efficient approach for each of them? I can use C and C++, but not boost.

Edit: By "efficiency" I mean time efficiency: Shortest execution time.

+1  A: 

That depends on what you're optimizing for, what do you mean by "efficient"? Is it important that the conversions be fast, use little memory, little programmer time, fewer WTFs from other programmers reading the code, or what?

For readability and ease of implementation, you should at least implement both Dec2Hex() and Dec2Binary() by just calling strotul(). That makes them into one-liners, which is very efficient for at least some of the above interpretations of the word.

unwind
By "efficiency" I mean time efficiency: Shortest execution time. Thanks for clarifying that.
Igor Oks
+1  A: 

Sounds very much like a homework problem, but what the heck...

The short answer is for converting from long int to your strings use two lookup tables. Each table should have 256 entries. One maps a byte to a hex string: 0 -> "00", 1 -> "01", etc. The other maps a byte to a bit string: 0 -> "00000000", 1 -> "00000001".

Then for each byte in your long int you just have to look up the correct string, and concatenate them.

To convert from strings back to long you can simply convert the hex string and the bit string back to a decimal number by multiplying the numeric value of each character by the appropriate power of 16 or 2, and summing up the results.

EDIT: You can also use the same lookup tables for backwards conversion by doing binary search to find the right string. This would take log(256) = 8 comparisons of your strings. Unfortunately I don't have time to do the analysis whether comparing strings would be much faster than multiplying and adding integers.

Dima
Regarding the strings to long conversion: Would it work faster than strotul() ?
Igor Oks
Don't know... Try it.
Dima
+3  A: 

I would suggest just using sprintf and sscanf.

Also, if you're interested in how it's implemented you can take a look at the source code for glibc, the GNU C Library.

Robert S. Barnes
Wouldn't it work slower than the other solutions?
Igor Oks
Two answers: 1. Test all solutions and see which is faster.2. Remember the code in the C Standard Library is typically expertly written and highly optimized - problems like these are the entire reason standard libraries exist, so programmers have access to expertly written solutions to extremely common problems and don't have to go and constantly reinvent the wheel themselves.
Robert S. Barnes
Remember also that sprintf and sscanf have been tested extensively, and aren't going to have the little bugs you might introduce trying to do the conversion yourself.
David Thornley
Unfortunately, %b is not a standard printf conversion specifier for binary. I would still consider using it, though.
Matthew Flaschen
+1  A: 

Why not just use a Macro to also take the format as an input. If you are in C at least.

#define TO_STRING( string, format, data) \
sprintf( string, "##format##", data)
// Int
TO_STRING(buf,%d,i);
// Hex ( Two char representation )
TO_STRING(buf,%02x,i);
// Binary
TO_STRING(buf,%b,i);

Or you can use sprintf directly: Or you can have multiple macroes.

#define INT_STRING( buf, data) \
sprintf( buf, "%d", data)
#define HEX_STRING( buf, data) \
sprintf( buf, "%x", data)
#define BIN_TO_STRING( buf, data) \
sprintf( buf, "%b", data)

BIN_TO_STRING( loc_buf, my_bin );
eaanon01
+1  A: 

Why do these routines have to be so time-efficient? That sort of claim always makes me wonder. Are you sure the obvious conversion methods like strtol() are too slow, or that you can do better? System functions are usually pretty efficient. They are sometimes slower to support generality and error-checking, but you need to consider what to do with errors. If a bin argument has characters other than '0' and '1', what then? Abort? Propagate massive errors?

Why are you using "Dec" to represent the internal representation? Dec, Hex, and Bin should be used to refer to the string representations. There's nothing decimal about an unsigned long. Are you dealing with strings showing the number in decimal? If not, you're confusing people here and are going to confuse many more.

The transformation between binary and hex text formats can be done quickly and efficiently, with lookup tables, but anything involving decimal text format will be more complicated.

David Thornley
+1  A: 

Let's think about half of task for a moment - converting from a string-ized base n to unsigned long, where n is a power of 2 (base 2 for binary and base 16 for hex).

If your input is sane, then this work is nothing more than a compare, a subract, a shift and an or per digit. If your input is not sane, well, that's where it gets ugly, doesn't it? Doing the conversion superfast is not hard. Doing it well under all circumstances is the challenge.

So let's assume that your input is sane, then the heart of your conversion is this:

unsigned long PowerOfTwoFromString(char *input, int shift)
{
    unsigned long val = 0;
    char upperLimit = 'a' + (1 << shift)
    while (*input) {
        char c = tolower(*input++);
        unsigned long digit = (c > 'a' && c < upperLimit) ? c - 'a' + 10 : c - '0';
        val = (val << shift) | digit;
    }
    return val;
 }

 #define UlongFromBinaryString(str) PowerOfTwoFromString(str, 1)
 #define UlongFromHexString(str) PowerOfTwoFromString(str, 4)

See how easy that is? And it will fail on non-sane inputs. Most of your work is going to go into making your input sane, not performance.

Now, this code takes advantage of power of two shifting. It's easy to extend to base 4, base 8, base 32, etc. It won't work on non-power of two bases. For those, your math has to change. You get

val = (val * base) + digit

which is conceptually the same for this set of operations. The multiplication by the base is going to be equivalent to the shift. So I'd be as likely to use a fully general routine instead. And sanitize the code while sanitizing the inputs. And at that point, strtoul is probably your best bet. Here's a link to a version of strtoul. Nearly all the work is handling edge conditions - that should clue you in on where you energies should be focused: correct, resilient code. The savings for using bit shifts is going to be minimal compared to the savings of say, not crashing on bad input.

plinth
+3  A: 

As others have pointed out, I would start with sscanf(), printf() and/or strtoul. They are fast enough for most applications, and they are less likely to have bugs. I will say, however, that these functions are more generic than you might expect, as they have to deal with non-ASCII character sets, with numbers represented in any base and so forth. For some domains it is possible to beat the library functions.

So, measure first, and if the performance of these conversion is really an issue, then:

1) In some applications / domains certain numbers appear very often, for example zero, 100, 200, 19.95, may be so common that it makes sense to optimize your functions to convert such numbers with a bunch of if() statements, and then fall back to the generic library functions. 2) Use a table lookup if the most common 100 numbers, and then fall back on a library function. Remember that large tables may not fit in your cache and may require multiple indirections for shared libraries, so measure these things carefully to make sure you are not decreasing performance.

You may also want to look at boost lexical_cast functions, though in my experience the latter are relatively compared to the good old C functions.

Tough many have said it, it is worth repeating over and over: do not optimize these conversions until you have evidence that they are a problem. If you do optimize, measure your new implementation to make sure it is faster and make sure you have a ton of unit tests for your own version, because you will introduce bugs :-(

coryan