views:

1206

answers:

10

I want to convert from char representing a hexadecimal value (in upper or lower case) to byte, like

'0'->0, '1' -> 1, 'A' -> 10, 'a' -> 10, 'f' -> 15 etc...

I will be calling this method extremely often, so performance is important. Is there a faster way than to use a pre-initialized HashMap<Character,Byte> to get the value from?

Answer

It seems like it's a tossup between using a switch-case and Jon Skeet's direct computing solution - the switch-case solution seems to edge out ever so slightly, though. Greg's array method wins out. Here are the performance results (in ms) for 200,000,000 runs of the various methods:

Character.getNumericValue:
8360

Character.digit:
8453

HashMap<Character,Byte>:
15109

Greg's Array Method:
6656

JonSkeet's Direct Method:
7344

Switch:
7281

Thanks guys!

Benchmark method code

Here ya go, JonSkeet, you old competitor. ;-)

public class ScratchPad {

    private static final int NUMBER_OF_RUNS = 200000000;

    static byte res;

    static HashMap<Character, Byte> map = new HashMap<Character, Byte>() {{
        put( Character.valueOf( '0' ), Byte.valueOf( (byte )0 ));
        put( Character.valueOf( '1' ), Byte.valueOf( (byte )1 ));
        put( Character.valueOf( '2' ), Byte.valueOf( (byte )2 ));
        put( Character.valueOf( '3' ), Byte.valueOf( (byte )3 ));
        put( Character.valueOf( '4' ), Byte.valueOf( (byte )4 ));
        put( Character.valueOf( '5' ), Byte.valueOf( (byte )5 ));
        put( Character.valueOf( '6' ), Byte.valueOf( (byte )6 ));
        put( Character.valueOf( '7' ), Byte.valueOf( (byte )7 ));
        put( Character.valueOf( '8' ), Byte.valueOf( (byte )8 ));
        put( Character.valueOf( '9' ), Byte.valueOf( (byte )9 ));
        put( Character.valueOf( 'a' ), Byte.valueOf( (byte )10 ));
        put( Character.valueOf( 'b' ), Byte.valueOf( (byte )11 ));
        put( Character.valueOf( 'c' ), Byte.valueOf( (byte )12 ));
        put( Character.valueOf( 'd' ), Byte.valueOf( (byte )13 ));
        put( Character.valueOf( 'e' ), Byte.valueOf( (byte )14 ));
        put( Character.valueOf( 'f' ), Byte.valueOf( (byte )15 ));
        put( Character.valueOf( 'A' ), Byte.valueOf( (byte )10 ));
        put( Character.valueOf( 'B' ), Byte.valueOf( (byte )11 ));
        put( Character.valueOf( 'C' ), Byte.valueOf( (byte )12 ));
        put( Character.valueOf( 'D' ), Byte.valueOf( (byte )13 ));
        put( Character.valueOf( 'E' ), Byte.valueOf( (byte )14 ));
        put( Character.valueOf( 'F' ), Byte.valueOf( (byte )15 ));
    }};
    static int[] charValues = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1, -1, 10, 11, 12, 13,14,15,-1,-1,-1,-1,-1,-1,-1,-1,-1,
                    -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,10, 11, 12, 13,14,15};
    static char[] cs = new char[]{'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','A','B','C','D','E','F'};

    public static void main(String args[]) throws Exception {
        long time = System.currentTimeMillis();
        for( int i = 0; i < NUMBER_OF_RUNS; i++ ) {
            res = getNumericValue( i );
        }
        System.out.println( "Character.getNumericValue:" );
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        for( int i = 0; i < NUMBER_OF_RUNS; i++ ) {
            res = getDigit( i );
        }
        System.out.println( "Character.digit:" );
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        for( int i = 0; i < NUMBER_OF_RUNS; i++ ) {
            try {
                res = getValueFromArray( i );
            } catch (IllegalArgumentException e) {
            }
        }
        System.out.println( "Array:" );
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        for( int i = 0; i < NUMBER_OF_RUNS; i++ ) {
            res = getValueFromHashMap( i );
        }
        System.out.println( "HashMap<Character,Byte>:" );
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        for( int i = 0; i < NUMBER_OF_RUNS; i++ ) {
            char c = cs[i%cs.length];
            res = getValueFromComputeMethod( c );        
        }
        System.out.println( "JonSkeet's Direct Method:" );
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        for( int i = 0; i < NUMBER_OF_RUNS; i++ ) {
            res = getValueFromSwitch( i );

        }
        System.out.println( "Switch:" );
        System.out.println( System.currentTimeMillis()-time );
    }

    private static byte getValueFromSwitch( int i ) {
        byte res;
        char ch = cs[i%cs.length];
        switch( ch ) {
            case '0':
                res = 0;
                break;
            case '1':
                res = 1;
                break;
            case '2':
                res = 2;
                break;
            case '3':
                res = 3;
                break;
            case '4':
                res = 4;
                break;
            case '5':
                res = 5;
                break;
            case '6':
                res = 6;
                break;
            case '7':
                res = 7;
                break;
            case '8':
                res = 8;
                break;
            case '9':
                res = 9;
                break;
            case 'a':
            case 'A':
                res = 10;
                break;
            case 'b':
            case 'B':    
                res = 11;
                break;
            case 'c':
            case 'C':    
                res = 12;
                break;
            case 'd':
            case 'D':    
                res = 13;
                break;
            case 'e':
            case 'E':    
                res = 14;
                break;
            case 'f':
            case 'F':    
                res = 15;
                break;
            default:
                throw new RuntimeException("unknown hex character: " + ch );
        }
        return res;
    }

    private static byte getValueFromComputeMethod( char c ) {
        byte result = 0;
        if (c >= '0' && c <= '9')
        {
            result =  (byte)(c - '0');
        }
        if (c >= 'a' && c <= 'f')
        {
            result = (byte)(c - 'a' + 10);
        }
        if (c >= 'A' && c <= 'F')
        {
            result =  (byte)(c - 'A' + 10);
        }
        return result;
    }

    private static byte getValueFromHashMap( int i ) {
        return map.get( Character.valueOf( cs[i%cs.length] ) ).byteValue();
    }

    private static byte getValueFromArray( int i ) {
        char c = cs[i%cs.length];
        if (c < '0' || c > 'f') {
            throw new IllegalArgumentException();
        }
        byte result = (byte)charValues[c-'0'];
        if (res < 0) {
            throw new IllegalArgumentException();
        }
        return result;
    }

    private static byte getDigit( int i ) {
        return (byte)Character.digit( cs[i%cs.length], 16 );
    }

    private static byte getNumericValue( int i ) {
        return (byte)Character.getNumericValue( cs[i%cs.length] );
    }

}
+7  A: 

A hash table would be relatively slow. This is pretty quick:

if (c >= '0' && c <= '9')
{
    return c - '0';
}
if (c >= 'a' && c <= 'f')
{
    return c - 'a' + 10;
}
if (c >= 'A' && c <= 'F')
{
    return c - 'A' + 10;
}
throw new IllegalArgumentException();

Another option would be to try a switch/case statement. An array might be okay if it's in cache, but a miss could be expensive.

Jon Skeet
What I say three times is true.
Michael Burr
Ick, no idea what happened there. That's what I get for posting from a mobile. Will edit.
Jon Skeet
A: 

Character.getNumericValue(char) is another way:

char c = 'a';
System.out.println(c + "->" + Character.getNumericValue(c));

Prints 'a->10' like you want for instance. Someone else would have to comment on the efficiency of a static metod call vs a HashMap lookup, or you could check it out for yourself. It seems cleaner/more readable to me though.

Keeg
+8  A: 

A preinitialised array would be faster than a HashMap. Something like this:

int CharValues['f'-'0'+1] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, ... -1, 10, 11, 12, ...};

if (c < '0' || c > 'f') {
    throw new IllegalArgumentException();
}
int n = CharValues[c-'0'];
if (n < 0) {
    throw new IllegalArgumentException();
}
// n contains the digit value

You should benchmark this method against other methods (such as Jon Skeet's direct method) to determine which will be the fastest for your application.

Greg Hewgill
first check for range should be removed: it only slows down the conversion; array will check it's boundaries itself; the exception will be different, but we're trading everything for speed, don't we?
Vladimir Dyuzhev
That's a good idea, and if you're concerned about the exception type you could catch the array exception and throw a different one.
Greg Hewgill
A: 

simple, but slow:

int i = Integer.parseInt(String.ValueOf(c), 16);

faster:

int i = Character.digit(c, 16);

I wont use any special code for "performance issues". If you really often use this, the JIT will create compiled code and execution will become fast. Keep your code clean. You may have a try and write a performace test comparing the execution time from the code above and any special implementation - i bet you wont get significant improvements.

Arne Burmeister
A: 

Using an array should be fastest.

An array could be of size 16, 16^2, 16^3, 16^4 etc..

Converting the number in larger groups than one would give a performance increase.

There will be a sweet spot where it is most worthwhile, possibly 4 digits (64k table).

pro
Why do you say it "should" be fastest? It all depends on the balance of memory access vs computation. That will partly depend on caching, the memory architecture of your particular machine etc. There's a lot to consider :)
Jon Skeet
Yes - but computation involves accessing memory for the instructions.In the case of one character it will be close, as benchmarked but even in this case an array lookup is only one read.In the general case, doing blocks of 4 in an array lookup should be much faster.Your points noted however!
pro
A: 

I don't think you can beat a direct array lookup.

static final int[] precalc = new int['f'+1];
static {
    for (char c='0'; c<='9'; c++) precalc[c] = c-'0';
    for (char c='A'; c<='F'; c++) precalc[c] = c-'A';
    for (char c='a'; c<='f'; c++) precalc[c] = c-'a';
}

System.out.println(precalc['f']);
Staale
A: 

Here's my tweaked version of Greg's code. On my box it's marginally faster - but probably within the noise. It avoids the lower bound check, and doesn't need to do any subtraction. Creating a 64K array and avoiding either bound check appeared to slow things down - but again, with timing like this it's virtually impossible to be sure what's real and what's noise.

public class HexParser
{
    private static final byte VALUES = new int['f'];

    // Easier to get right for bozos like me (Jon) than
    // a hard-coded array :)
    static
    {
        for (int i=0; i < VALUES.length; i++)
        {
            VALUES[i] = (byte) -1;
        }
        for (int i='0'; i <= '9'; i++)
        {
            VALUES[i] = (byte) i-'0';
        }
        for (int i='A'; i <= 'F'; i++)
        {
            VALUES[i] = (byte) (i-'A'+10);
        }
        for (int i='a'; i <= 'f'; i++)
        {
            VALUES[i] = (byte) (i-'a'+10);
        }
    }

    public static byte parseHexChar(char c)
    {
        if (c > 'f')
        {
            throw new IllegalArgumentException();
        }
        byte ret = VALUES[c];
        if (ret == -1)
        {
            throw new IllegalArgumentException();
        }
        return ret;
    }
}
Jon Skeet
A: 

int CharValues[256] = { 16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16, 16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,0,1,2,3,4,5,6,7,8,9,16,16,16,16,16,16,16, 16,10,11,12,13,14,15,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16, 16,10,11,12,13,14,15,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16, 16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16, 16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16, 16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16, 16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16 }

int n = CharValues[c];

if (n == 16) throw new IllegalArgumentException();

// n contains the digit value

pro
I believe your array is wrong - the '0' should start at location 0x30, you have it starting at 0x2D. 'A' should start at 0x41, you have it starting at 0x40, and you don't support lower case characters ('a' starts at 0x61)
Adam Davis
agreed - I think I've got it right now... but would appreciate you checking as it's making me dizzy looking at it
pro
+1  A: 

I don't recall seeing this method before, but Mikko Rantanen pointed this equation out in a comment on the question, Code golf - hex to (raw) binary conversion

(char | 32) % 39 - 9

I don't know what it would benchmark as (perhaps someone can add it to the benchmark above and run it, but I'm guessing the % kills the performance) - but it's a neat, simple one-liner for single character hexadecimal to decimal conversion. Handles 0-9, A-F, a-f.

Adam Davis
A: 

It worth noting that you are realing timing the % operation in most of your tests. This operation takes about the same amount of time as some of the other options.

private static byte lookUpTest(int i) {
    return (byte) cs[i%cs.length];
}
Peter Lawrey