tags:

views:

1142

answers:

10

In this thread some one commented that the following code should only be used in 'toy' projects. Unfortunately he hasn't come back to say why it's not of production quality so I was hoping some one in the community may be able to either assure me the code is ok (because I quite like it) or identify what is wrong.

template< class T1, class T2>
void hexascii( T1& out, const T2& in )
{
    out.resize( in.size() * 2 );
    const char hexDigits[] = {'0', '1', '2', '3', '4', '5', '6', '7','8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
    T1::iterator outit = out.begin();
    for( T2::const_iterator it = in.begin(); it != in.end(); ++it )
    {
        *outit++ = hexDigits[*it >> 4];
        *outit++ = hexDigits[*it & 0xF];
    }
}

template<class T1, class T2>
void asciihex( T1& out, const T2& in )
{
    size_t size = in.size;
    assert( !(size % 2) );

    out.resize( size / 2 );
    T1::iterator outit = out.begin();
    for( T2::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
    {
    *outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
                (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f);
    }
}

Edit: Thanks for your help guys, you've made some big improvements. I've written functions in the two suggested styles from your answers. Some rough testing suggests the second method is marginally faster than the first, but IMO this is outweighed by the improved readability of the first.

template<class T1>
void asciihex2( T1& out, const std::string& in )
{
    dassert( sizeof(T1::value_type)==1 );
    size_t size = in.size();
assert( !(size % 2) );
    out.resize( size / 2 );
    T1::iterator outit = out.begin();
    for( size_t i = 0; i < in.size(); i += 2 )
    {
        int tmp;
        sscanf( in.c_str() + i, "%02X", &tmp );
        *outit++ = tmp;
    }
}

template<class T1>
void asciihex3( T1& out, const std::string& in )
{
    dassert( sizeof(T1::value_type)==1 );
    size_t size = in.size();
assert( !(size % 2) );
    out.resize( size / 2 );
    T1::iterator outit = out.begin();
const char hexDigits[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
                       0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
         0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F};
for( std::string::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
    {
    *outit = (hexDigits[(*it - 0x30) & 0x1f] << 4) + 
        hexDigits[((*(it+1) - 0x30) & 0x1f)];
    }
}

Some of the assumptions surronding this code: 1: They are not intended as a generic, but are used in an anonymous name space to translate data for a specific class. 2: The templating is required as two separate container types are being used (one being std::vector, the other a similar byte array type container from a third party library. 3: The purpose is to be able to convert binary data of indeterminate length into strings and back again (0x1234abcd <-> "1234abcd") 4: assert traps errors in both debug and release modes 5: by the time these functions are called the size of the string will already have been checked, assert is used to terminate processing if something serious has gone wrong 6: It needs some commenting

Any other ideas appreciated.

A: 

Seems reasonable to me on first glance.

kenny
+8  A: 

It seems like a lot of templated code to achieve very little, given you have direct hex conversion in the standard C scanf and printf functions. why bother?

Shane MacLaughlin
I was thinking the same thing.
Daniel
sscanf/printf are not type-safe. The stream << and >> operators are.
xtofl
True, but you could wrap sscanf and sprintf with very simple functions to achieve the type safety. The above templated code is way over the top for something this simple, and the use of templates incorrectly suggests the code is type friendly.
Shane MacLaughlin
I think this answer comes from misunderstanding the purpose of the functions (the mistake being in the question, not your reading of it - I've added a bit to the question). I can't see any 'nice' way of using printf for the purpose of changing a byte array into printable ascii and back.
Patrick
In fact I can't see how to turn 0x30 into 0x00 using sprintf at all. Do you agree or am I being dumb?
Patrick
check out the printf %x qualifier, in the following link, http://www.cplusplus.com/reference/clibrary/cstdio/printf.html, e.g. sprintf(str,"%02x",i). Similarly for sscanf http://www.cplusplus.com/reference/clibrary/cstdio/scanf.html, as in sscanf("%x",
Shane MacLaughlin
Which turns 0x00 into 0x30 but how to do the reverse?
Patrick
Shane MacLaughlin
Printf is great but there are times when one might need functions like this in an environment where sscanf/printf are not desired (or perhaps not available): embedded, low-level debugging, etc.
jwfearn
@jwfearn True, but there are still better ways that the example above. Particuarly in embedded where resources are an issue, I would tend to go for a streamed output rather than resizing a buffer.
Shane MacLaughlin
@smacl: what if s = "12abdc34d5ba990345326234656". When I code this with scanf it just looks very wrong.
Patrick
@smacl: don't resize a buffer in embeded due to resources? I'm confused. A stream provides a neat API for a buffer but the stream still holds the data in a buffer.
Patrick
@smacl: ignore the scanf looks wrong comment. I just needed to think about it for a bit
Patrick
+6  A: 

My main comment about it is that it's very difficult to read.

Especially:

*outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
            (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f)

It would take my brain a little while to grok that, and annoy me if I inherited the code.

Stephen Cox
Likewise, confusing for sure.
Shane MacLaughlin
"A little while"? Either genius or understatement.
Anthony
Ok, a long while. ;-)
Stephen Cox
That line defintely needs a comment but I can't see anyway of simplifying it once this approach to conversion is chosen. Splitting it over multiple lines...
Patrick
Agreed, it's not clear how to improve it. Possible moving some of the checks into submethods. Still, it makes my eyes bug out.
Stephen Cox
I'd certainly replace "(*it > '9' ? *it - 0x07 : *it) - 0x30)" with an inline function 'getHexValue', and call that twice. I'm also unsure of the value of the masks: if the input string is garbage, who cares what value is output? If the input string is valid, the masks have no effect.
Steve Jessop
The result is "*outit = (getHexValue(it) << 4) + getHexValue(it+1);", which I'd settle for.
Steve Jessop
+3  A: 

I don't really object against it. It's generic (within limits), it uses consts, references where needed, etc... It lacks a bit of documentation, and the asciihex *outit assignment is not quite clear at first sight.

resize initializes the output's elements unnecessary (use reserve instead).

Maybe the genericity is somewhat too flexible: you can feed the algorithms with any datatype you like, while you should only give it hex numbers (not e.g. a vector of doubles)

And indeed, it may be a bit overkill, given the presence of good library functions.

xtofl
The code will not work if you use reserve. If out.size() is 0 before you call reserve, it will still be 0 after you call reserve, so the loops wouldn't execute.See http://www.gotw.ca/gotw/074.htm
Joel
Indeed, you would need additional adaptations. My point was that objects got constructed unnecessarily.
xtofl
+4  A: 

What is it supposed to do? There is no well-known accepted meaning of hexascii or asciihex, so the names should change.

[edit] Converting from binary to hex notation should often not be called ascii..., as ascii is a 7-bit format.

Stephan Eggermont
Good call, I only just noticed that now. At a glance, I assumed it was a function for converting between various types of integers and hex. Re-reading it appears to be for converting between blocks of binary (in bytes) and hex.
Shane MacLaughlin
Edited the question, change a binary array into printable format and back again.
Patrick
+2  A: 

Some problems that I see:

This will work great if it is only used for an input container that stores 8 bit types - e.g. char or unsigned char. For example, the following code will fail if used with a 32 bit type whose value after the right shift is greater than 15 - recommend that you always use a mask to ensure that lookup index is always within range.

*outit++ = hexDigits[*it >> 4];

What is the expected behavior if you pass in a container containing unsigned longs - for this to be a generic class it should probably be able to handle the conversion of 32 bit numbers to hext strings also.

This only works when the input is a container - what if I just want to convert a single byte? A suggestion here is to refactor the code into a core function that can covert a single byte (hex=>ascii and ascii=>hex) and then provide additional functions to use this core function for coverting containers of bytes etc.

In asciihex(), bad things will happen if the size of the input container is not divisible by 2. The use of:

it != in.end(); it += 2

is dangerous since if the container size is not divisible by 2 then the increment by two will advance the iterator past the end of the container and the comparison against end() will never work. This is somewhat protected against via the assert call but assert can be compiled out (e.g. it is often compiled out in release builds) so it would be much better to make this an if statement.

Stephen Doyle
The assert on the second line of asciihex() checks that the input size is divisible by 2, so +=2 is safe in this case - I agree though that it at least looks dangerous, and I think I'd code it a bit differently myself.
jrb
@jrbushell, assert will only be included in debug versions. It has no effect on release code, and is an aid to testing rather than run-time checking
Shane MacLaughlin
assert doesn't only check in debug code in our build environment(though you can configure it to do so)
Patrick
+3  A: 

What's wrong with

*outit = hexDigits[*it]

Why can't these two functions share a common list of hexDigits and eliminate the complex (and slow) calculation of an ASCII character?

S.Lott
How do you link 0x30 to 0x00? You could use a map I suppose but seems like overkill.
Patrick
The map is a trivial array lookup. There will only be a few bytes (256 in the crazy worst case; 128 is more realistic). And it will perform instantly using an add and a multiply and nothing more.
S.Lott
+3  A: 
  • Code has assert statements instead of proper handling of an error condition (and if your assert is turned off, the code may blow up)

  • for loop has dangerous double-increase of iterator (it+=2). Especially in case your assert did not fire. What happens when your iterator is already at the end and you ++ it?

  • Code is templated, but what you're doing is simply converting characters to numbers or the other way round. It's cargo cult programming. You hope that the blessings of template programming will come upon you by using templates. You even tagged this as a template question although the template aspect is completely irrelevant in your functions.

  • the *outit= line is too complicated.

  • code reinvents the wheel. In a big way.

Thorsten79
Thanks Thorsten, I was hoping you'd see this. On the cargo cult point, the code allows me to process 2 containers from different libraries without having to write 2 functions which differ only in their parameters, one of the purposes of templates. I didn't consider all the other containers :(.
Patrick
I value your thoughts about being interoperable between libraries. But why are you fiddling with character constants and obscure ASCII values? It's code "halfway between the gutter and the stars". It tries to be elegant but fails to deliver on that promise. You could easily use C++ strings here.
Thorsten79
A: 

The reason I would consider it toy code is there is no error checking.

I could pass it two vector and it would happily try and do something and make a complete mess generating random gibberish.

Martin York
+1  A: 

Problems I spot:

hexascii does not check if sizeof(T2::value_type)==1

hexascii dereferences it twice, asciihex even more. There's no reason for this, as you can store the result. This means you can't use an istream_iterator.

asciihex needs a random iterator as input, because (it+1) and (it+=2) are used. The algorithm could work on a forward iterator if you use only (++it).

(*it > '9' ? *it - 0x07 : *it) - 0x30 can be simplified to *it - (*it > '9' ? 0x37 : 0x30) so there is only one unconditional subtraction left. Still, an array lookup would be more efficient. Subtract 0x30. '0' will become 0;'A' will become 0x11 and 'a' will become 0x31. Mask with 0x1f to make it case-insensitive, and you can do the resulting lookup in a char[0x20] without overflow risks. Non-hex chars will just give you weird values.

MSalters