ansaurus

Question

Answer 1

+2 A:

For one, instead of *16 do a bitshift << 4

Also don't use the stl string, instead just create a buffer on the heap and then delete it. It will be more efficient than the object destruction that is needed from the string.

Brian R. Bondy 2008-09-16 03:17:24

The compiler should do that for you. Bitshifting instead of multiplying makes the code far less human-readable.

yjerem 2008-09-16 03:18:26

It's not like his code is void of other bit shifts...And there is no guarantee that his compiler will do this. And his specific question was about making it more efficient.

Brian R. Bondy 2008-09-16 03:20:15

There's no guarantee that the multiply is faster than shift either.

nos 2009-08-07 12:51:53

Answer 2

+3 A:

Operate on 32 bits at a time (4 chars), then deal with the tail if needed. When I did this exercise with urlencoding a full table lookup for each char was slightly faster than logic constructs, so you may want to test this in context as well to take caching issues into account.

/Allan

Allan Wind 2008-09-16 03:18:30

Answer 3

+1 A:

not going to make a lot of difference... pChar-(ofs16) can be done with [*pCHar & 0x0F]

Keith Nicholas 2008-09-16 03:20:11

Answer 4

+1 A:

This is my version, which, unlike the OP's version, doesn't assume that std::basic_string has its data in contiguous region:

#include <string>

using std::string;

static char const* digits("0123456789ABCDEF");

string
tohex(string const& data)
{
    string result(data.size() * 2, 0);
    string::iterator ptr(result.begin());
    for (string::const_iterator cur(data.begin()), end(data.end()); cur != end; ++cur) {
        unsigned char c(*cur);
        *ptr++ = digits[c >> 4];
        *ptr++ = digits[c & 15];
    }
    return result;
}

Chris Jester-Young 2008-09-16 03:38:37

Answer 5

A:

Make sure your compiler optimization is turned on to the highest working level.

You know, flags like '-O1' to '-03' in gcc.

2008-09-16 03:41:54

Answer 6

+4 A:

At the cost of more memory you can create a full 256-entry table of the hex codes:

static const char _hex2asciiU_value[256][2] =
    { {'0','0'}, {'0','1'}, /* ..., */ {'F','E'},{'F','F'} };

Then direct index into the table, no bit fiddling required.

const char *pHexVal = pHex[*pChar];
pszHex[0] = pHexVal[0];
pszHex[1] = pHexVal[1];

Robert 2008-09-16 03:42:49

Answer 7

+1 A:

Antti Sykäri 2008-09-16 04:04:23

Answer 8

A:

I have found that using an index into an array, rather than a pointer, can speed things up a tick. It all depends on how your compiler chooses to optimize. The key is that the processor has instructions to do complex things like [i*2+1] in a single instruction.

Mark Ransom 2008-09-16 04:57:21

seems dubious! An index into an array *is* just pointer arithmetic!

Mitch Wheat 2008-09-29 03:54:02

Answer 9

A:

The function as it is shown when I'm writing this produces incorrect output even when _hex2asciiU_value is fully specified. The following code works, and on my 2.33GHz Macbook Pro runs in about 1.9 seconds for 200,000,000 million characters.

#include <iostream>

using namespace std;

static const size_t _h2alen = 256;
static char _hex2asciiU_value[_h2alen][3];

string char_to_hex( const unsigned char* _pArray, unsigned int _len )
{
    string str;
    str.resize(_len*2);
    char* pszHex = &str[0];
    const unsigned char* pEnd = _pArray + _len;
    const char* pHex = _hex2asciiU_value[0];
    for( const unsigned char* pChar = _pArray; pChar != pEnd; pChar++, pszHex += 2 ) {
       pszHex[0] = _hex2asciiU_value[*pChar][0];
       pszHex[1] = _hex2asciiU_value[*pChar][1];
    }
    return str;
}


int main() {
  for(int i=0; i<_h2alen; i++) {
    snprintf(_hex2asciiU_value[i], 3,"%02X", i);
  }
  size_t len = 200000000;
  char* a = new char[len];
  string t1;
  string t2;
  clock_t start;
  srand(time(NULL));
  for(int i=0; i<len; i++) a[i] = rand()&0xFF;
  start = clock();
  t1=char_to_hex((const unsigned char*)a, len);
  cout << "char_to_hex conversion took ---> " << (clock() - start)/(double)CLOCKS_PER_SEC << " seconds\n";
}

hoyhoy 2008-09-16 08:05:41

>200,000,000 <s>million</s> characters

Josh 2008-09-17 05:52:12

Answer 10

A:

If you're rather obsessive about speed here, you can do the following:

Each character is one byte, representing two hex values. Thus, each character is really two four-bit values.

So, you can do the following:

Unpack the four-bit values to 8-bit values using a multiplication or similar instruction.
Use pshufb, the SSSE3 instruction (Core2-only though). It takes an array of 16 8-bit input values and shuffles them based on the 16 8-bit indices in a second vector. Since you have only 16 possible characters, this fits perfectly; the input array is a vector of 0 through F characters, and the index array is your unpacked array of 4-bit values.

Thus, in a single instruction, you will have performed 16 table lookups in fewer clocks than it normally takes to do just one (pshufb is 1 clock latency on Penryn).

So, in computational steps:

A B C D E F G H I J K L M N O P (64-bit vector of input values, "Vector A") -> 0A 0B 0C 0D 0E 0F 0G 0H 0I 0J 0K 0L 0M 0N 0O 0P (128-bit vector of indices, "Vector B"). The easiest way is probably two 64-bit multiplies.
pshub [0123456789ABCDEF], Vector B

Dark Shikari 2008-09-16 08:11:49

Answer 11

A:

I'm not sure doing it more bytes at a time will be better... you'll probably just get tons of cache misses and slow it down significantly.

What you might try is to unroll the loop though, take larger steps and do more characters each time through the loop, to remove some of the loop overhead.

slicedlime 2008-09-16 08:25:35

More bytes at a time ought to work great, up to the system's word size

Josh 2008-09-17 05:53:33

Answer 12

A:

Consistently getting ~4ms on my Athlon 64 4200+ (~7ms with original code)

for( const unsigned char* pChar = _pArray; pChar != pEnd; pChar++) {
    const char* pchars = _hex2asciiU_value[*pChar];
    *pszHex++ = *pchars++;
    *pszHex++ = *pchars;
}

2008-09-16 13:16:36

Answer 13

+2 A:

hoyhoy 2008-09-17 00:18:28

Answer 14

+3 A:

This assembly function (based off my previous post here, but I had to modify the concept a bit to get it to actually work) processes 3.3 billion input characters per second (6.6 billion output characters) on one core of a Core 2 Conroe 3Ghz. Penryn is probably faster.

%include "x86inc.asm"

SECTION_RODATA
pb_f0: times 16 db 0xf0
pb_0f: times 16 db 0x0f
pb_hex: db 48,49,50,51,52,53,54,55,56,57,65,66,67,68,69,70

SECTION .text

; int convert_string_to_hex( char *input, char *output, int len )

cglobal _convert_string_to_hex,3,3
    movdqa xmm6, [pb_f0 GLOBAL]
    movdqa xmm7, [pb_0f GLOBAL]
.loop:
    movdqa xmm5, [pb_hex GLOBAL]
    movdqa xmm4, [pb_hex GLOBAL]
    movq   xmm0, [r0+r2-8]
    movq   xmm2, [r0+r2-16]
    movq   xmm1, xmm0
    movq   xmm3, xmm2
    pand   xmm0, xmm6 ;high bits
    pand   xmm2, xmm6
    psrlq  xmm0, 4
    psrlq  xmm2, 4
    pand   xmm1, xmm7 ;low bits
    pand   xmm3, xmm7
    punpcklbw xmm0, xmm1
    punpcklbw xmm2, xmm3
    pshufb xmm4, xmm0
    pshufb xmm5, xmm2
    movdqa [r1+r2*2-16], xmm4
    movdqa [r1+r2*2-32], xmm5
    sub r2, 16
    jg .loop
    REP_RET

Note it uses x264 assembly syntax, which makes it more portable (to 32-bit vs 64-bit, etc). To convert this into the syntax of your choice is trivial: r0, r1, r2 are the three arguments to the functions in registers. Its a bit like pseudocode. Or you can just get common/x86/x86inc.asm from the x264 tree and include that to run it natively.

P.S. Stack Overflow, am I wrong for wasting time on such a trivial thing? Or is this awesome?

Dark Shikari 2008-09-17 01:20:38

ansaurus

tags:

views:

answers:

char[] to hex string exercise

related questions