views:

179

answers:

5

Hi,

I have two char's and I want to "stitch" them bitwise together.
For example:

char c1 = 11; // 0000 1011
char c2 = 5;  // 0000 0101
short int si = stitch(c1, c2); // 0000 1011 0000 0101

So, what I first tried was with bitwise operators:

short int stitch(char c1, char c2)
{
    return (c1 << 8) | c2;
}

But this doesn't work: I'm getting a short equals to c2... (1) Why?
(But: c1 and c2 are negative numbers in my real app... maybe this is a part of the problem?)

So, my second solution was to use a union:

union stUnion
{
    struct
    {
         char c1;
         char c2;
    }
    short int si;
}

short int stitch(char c1, char c2)
{
    stUnion u;
    u.c1 = c1;
    u.c2 = c2;
    return u.si;
}

This works like I want... I think

(2) What is the best/fastest way?

Thanks!

+3  A: 

$5.8/1 states- "The operands shall be of integral or enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or reater than or equal to the length in bits of the promoted left operand."

So try to typecast c1 to unsigned int and then bitwise OR with C2. Also return the output as unsigned int. chars are promoted to int but we want to be 'unsigned int'

Chubsdad
The assumption is that char is 8 bits. If this is not true, then my response does not hold true It is also important to know that 'char' is a plain char type. It could either be signed or unsigned as C++ supports 3 char types.As you mentioned that you are feeding in negative values, it mostly means that 'char' is 'signed char' in your environment.
Chubsdad
+1  A: 

The reason is that c2 is first promoted to int before doing the bitwise OR, which causes sign extension to take place (assuming that char is signed and can hold negative values):

char x1 = -2; // 1111 1110
char x2 = -3; // 1111 1101

short int si = stitch(c1, c2); // 1111 1111 1111 1101

The representation of x2 promoted to int is (at least) 1 byte full of 1, so it overwrites the zero bit of x1 that you previously shifted up. You can cast to unsigned char first. With two complement representation, that will not change the bitpattern in the lowest byte. While not strictly necessary, you may cast c1 to unsigned char too, for consistency (if short is 2 bytes long, it won't matter that c1 was sign extended beyond those 2 bytes)

short int stitch(char c1, char c2) {
    return ((unsigned char)c1 << 8) | (unsigned char)c2;
}
Johannes Schaub - litb
Please s/8/CHAR_BIT/
Luther Blissett
@Luther please propose that to the questioner. If I would do that in my answer, I would change the behavior of `stitch`. Using `CHAR_BIT` will make him dependent on the target machine, which may not be desired (for protocols that use octets etc).
Johannes Schaub - litb
Of course, this means that it's probably a good idea to use the `uintX_t` types as @Potatoswatter recommends. The availability of those will also guarantee him a two's complement representation, according to the spec.
Johannes Schaub - litb
+5  A: 

The union method is implementation-defined at best (in practice, it will pretty reliably work but the format of si depends on the endianness of the platform) and undefined behavior at worst (meaning crash the computer, do not pass Go and collect $200). By the letter of the law, you are not allowed to read any member of a union except the one that was last written to.

The problem with the bitwise way is, as you suspect, the negative numbers. A negative number is represented by a chain of leading 1's. So -5 for example is

1111 1011

If you cast this to int or even unsigned int, it becomes

1111 1111 1111 … 1111 1011

and all those 1's will drown out the left-shifted data when OR is applied.

To solve the problem, cast the chars to unsigned char and then to int (to prevent overflow, or even the appearance of the possibility of overflow) before shifting:

short int stitch(char c1, char c2)
{
    return ( (int) (unsigned char) c1 << 8) | (unsigned char) c2;
}

or, if you are free to change the types of the arguments and you can include <cstdint>,

uint16_t stitch( uint8_t c1, uint8_t c2)
{
    return ( (int) c1 << 8 ) | c2;
}
Potatoswatter
+1 for uint16_t, uint8_t
Luther Blissett
A: 

You should not use an union for that. You should never use union fields simultaneously. If an union has member A and member B then you must consider that A and B are unrelated. That's because compiler is free to add padding anywhere it wants (except at the front of struct). Another problem is byte order (little/big endian).

//EDIT There is exception to above "union rule", you can use these members simultaneously which are at the fron and have the same layout. i.e.

union {
    struct {
        char c;
        int i;
        short s;
    } A;
    struct {
        char c;
        int i;
        char c1;
        char c2;
    } B;
};

A.c and A.i can be used simultaneously with B.c and B.i

adf88
where does the standard mention about the padding at the "front of union" (struct->union) rule?
Chubsdad
There is. I'll find.
adf88
The compiler is not free to add padding before the `short` in the union, because the address of the union is identical with the address of each of its members. Byte order, of course, is a legitimate concern.
Potatoswatter
@Potatoswatter: Yup, that makes sense
Chubsdad
Read $9.2.14 - $9.2.17
adf88
A: 

The shift/or method, once fixed, is cleaner as it does not depend on byte ordering.

Apart from that, the union method is likely slower on many modern CPUs due to a store-to-load-forwarding (STLF) problem. You are writing a value to memory and then reading it back as different data types. Many CPUs cannot send the data to the load quickly if this occurs. The load needs to wait until the store is entirely done (retired), writing its data to the L1 cache.

On very old CPUs without a barrel shifter (shifting by 8 needs 8 operations) and with simple in-order execution, such as the 68000, the union method may be faster.

jilles