views:

420

answers:

6

I have some C code that I'd like to port to java. I haven't done much C coding, but I was able to follow along up until this one function. If anyone could help me understand what is going on, it would be greatly appreciated.

int reverse_integer(int input) {
    int output = 0, i;

    for ( i=0, i<sizeof(int); i++ ) {
         output = ( input & 0x000000FF ) | output; 
         input >>= 8;
         if ( i < 3 ) {
             output <<= 8;
         }
    }

    return output;
}

The function is used as such:

char * position = //some data

/*the included comment on this next line states its the size of a string*/
int i = reverse_integer( *(int*)position )
+5  A: 

It is reversing the byte order (the endianness) of an integer.

There also appears to be a bug where the programmer uses 3, assuming it will be sizeof(int) - 1.

Shmoopty
+1, it does, but is it not the weird way to swap bytes around? ;-)
Michael Krelin - hacker
Actually, the 3 is guarding against creating a result value wider than four bytes. It places the fourth and every additional high byte into the least significant byte of the result integer. It should probably just bail if `sizeof(int)` is greater than four, given its mostly irresponsible treatment of that situation.
seh
seh, and do that at compile time. but it's weird to do that with bit-shifting, anyway.
Michael Krelin - hacker
@seh: Agreed! That's what stopped me from saying with 100% certainty that it's a bug. But it sure smells like one.
Shmoopty
@hacker: By performing the reversal with bit-shifting, it can reasonably be done entirely in CPU registers.
Shmoopty
@Shmoopty, @hacker, I'm sorry for that byte thing I posted there. I was so disgusted I deleted it xD. I was thinking a char is one byte, then my brain wires got all messed up.
webdreamer
Shmoopty, makes sense... But looks disgusting ;-)
Michael Krelin - hacker
+1  A: 

The function is reversing byte-order, as needed when converting between big endian and little endian data. Most network protocols required 32-bit integers be stored in Big Endian order, but Intel processors store numbers in Little Endian order, so you need to swap byte orders when reading or writing data to the network. (This applies to low level stuff, not for protocols like HTTP where numbers are transmitted as text.)

I believe the function would actually compile as regular Java, with the exception of sizeof(int), which you could replace with 4 since the JVM defines int to be 32 bits wide (in C, there are no guarantees).

It looks like position is a block of binary data, not a string. The type of position is char *, which means a pointer to a character (1 byte).

The expression *position would dereference that pointer, getting the 1 byte it is pointing to. However, the author of the code wanted a full int's worth of bytes from the data block. (4 bytes if compiled for a 32-bit architecture; 8 bytes for 64-bit machine.)

So, to get the full int, the pointer is cast from a byte pointer to an int pointer: (int *)position. Then, because we want the value at that address, we stick an asterisk in front to dereference it: *(int *)position.

benzado
+1  A: 

[too large for a comment] If you put the output <<= 8 at the beginning of the loop you can avoid one if:

#include <limits.h>
int reverse_integer(int input) {
    int output = 0, i;

    for (i = 0; i < sizeof input; i++) {
         output <<= CHAR_BIT;
         output = (input & ((1 << CHAR_BIT) - 1)) | output; 
         input >>= CHAR_BIT;
    }
    return output;
}

This function reverses the bytes in an int. An input of 0x12345678, in an implementation where CHAR_BIT is 8 and sizeof (int) is 4 (the most usual nowadays), returns 0x78563412.

pmg
+2  A: 

There is one very serious problem with this function: it is solving a standard problem that has available solutions. In short, it is re-inventing the wheel.

Well, I'm making an assumption here. I'm assuming that the reason for reversing the integer is to convert from little-endian to big-endian or vice versa. The usual reason for this is that you are on a little-endian computer (any Intel or AMD x86 chip) and you need to send ore receive data from a network in "network order", i.e. big-endian.

If I am correct in my assumption, in C you can call one of:

ntohl()
hlton()

More info on these functions here:

http://www.codeguru.com/forum/showthread.php?t=298741

If you are already on a big-endian computer, and you want to reverse the integer for some other reason, then these calls won't help you (because "network order" is big-endian, so if you are already on a big-endian computer, hlton() will not change anything).

I did a Google search for "Java ntohl" and found these links:

http://www.velocityreviews.com/forums/t139571-ntohl-ntohs-etc.html

http://www.coderanch.com/t/366549/Java-General/java/Java-equivilent-c-functions-htonl

So, I think you may not need to port this at all; you can perhaps just grab a solution from one of these two links.

steveha
Yes, but `ntohl()` and `htonl()` are POSIX functions, not defined by the Standard C -- ie, not all implementations (of Standard C) have them.
pmg
Fair enough. But even if you are on some non-GCC system that doesn't have them, Google can likely find you a free library you can just use. Likewise, in Java, someone must have solved this problem already; and in fact one of the links I provided has debugged code at the end of it and the other one has a reference to `java.nio.ByteBuffer`. This is one of the very standard problems, like sorting. And if you use already-written, already-debugged code, you don't need to debug it.
steveha
Although, if you use some already-written code, you *should* test it and make sure it does what it promises (and that you understand how to use it correctly).
steveha
A: 

I suggest the following code to byte swap ints:

U16
Swap16
(
    U16 x
)
{
    return (0xFF00 & x) >> 8 | (0x00FF & x) << 8;
}


U32
Swap32
(
    U32 x
)
{
#if defined(__i386__)
    __asm__("bswap   %0" : "+r" (x));
    return x;
#else
    return (0xFF000000 & x) >> 24 |
           (0x00FF0000 & x) >> 8 |
           (0x0000FF00 & x) << 8 |
           (0x000000FF & x) << 24;
#endif
}


U64
Swap64
(
    U64 x
)
{
#if defined(__i386__)
    __asm__("bswap   %%eax\n\t"
            "bswap   %%edx\n\t"
            "xchgl   %%eax, %%edx" : "+A" (x));
    return x;
#elif defined(__x86_64__)
    __asm__("bswap   %0" : "+r" (x));
    return x;
#else
    return (0xFF00000000000000LL &
            x) >> 56 | (0x00FF000000000000LL & x) >> 40
           | (0x0000FF0000000000LL &
              x) >> 24 | (0x000000FF00000000LL & x) >> 8
           | (0x00000000FF000000LL &
              x) << 8 | (0x0000000000FF0000LL & x) << 24
           | (0x000000000000FF00LL &
              x) << 40 | (0x00000000000000FFLL & x) << 56;
#endif
}

Where U16, U32 and U64 as integer types of that size.

The asm is for gcc.

+5  A: 
Craig Putnam
+1 for mentioning Integer.reverseBytes(int)
Gunslinger47