tags:

views:

2031

answers:

9

I want to read sizeof(int) bytes from a char* array.

a)In what scenario's we need to worry if endian needs to be checked

b)how would you read the first 4 bytes considering taking endian consideration or no consideration.

EDIT: The sizeof(int) bytes that I have read needs to be compared with the an integer value.

What is the best approach to go about this problem

+1  A: 

You shouldn't need to worry about endianess unless you are reading the bytes from a source created on a different machine, e.g. a network stream.

Given that, can't you just use a for loop?

void ReadBytes(char * stream) {
    for (int i = 0; i < sizeof(int); i++) {
        char foo = stream[i];
        }
    }
 }

Are you asking for something more complicated than that?

Steve Rowe
My data is actually created from a different source
kal
A: 

You need to worry about endianess only if the data you're reading is composed of numbers which are larger than one byte.
if you're reading sizeof(int) bytes and expect to interpret them as an int then endianess makes a difference. essentially endianness is the way in which a machine interprets a series of more than 1 bytes into a numerical value.

shoosh
+1  A: 

Depends on how you want to read them, I get the feeling you want to cast 4 bytes into an integer, doing so over network streamed data will usually end up in something like this:

int foo = *(int*)(stream+offset_in_stream);
Daniel
That may result in an unaligned access.
gimpf
@gimpf: I'm curious: on which systems will this actually lead to an error?
Christoph
I.e. on 80486 and any better CPU with the Align-Flag set.
bothie
When would the alignment flag be set?
Rob Kennedy
on my professors cpu it would cause a bus error. on sun processors (i believe sparcs), this can fail too. basically any processor that does not support unaligned reads/writes
Johannes Schaub - litb
+3  A: 

Do you mean something like that?:

char* a;
int i;
memcpy(&i, a, sizeof(i));

You only have to worry about endianess if the source of the data is from a different platform, like a device.

Dani van der Meer
A: 

Just use a for loop that moves over the array in sizeof(int) chunks.
Use the function ntohl (found in the header <arpa/inet.h>, at least on Linux) to convert from bytes in the network order (network order is defined as big-endian) to local byte-order. That library function is implemented to perform the correct network-to-host conversion for whatever processor you're running on.

Chris Connett
Of course, this applies only if you're actually reading something from the network...
gimpf
Ok, he stated in the _comment_ that he is reading it from a different machine. Well, maybe done by burning/reading a CD, but more probably he indeed meant some kind of network.
gimpf
+4  A: 

a) You only need to worry about "endianness" (i.e., byte-swapping) if the data was created on a big-endian machine and is being processed on a little-endian machine, or vice versa. There are many ways this can occur, but here are a couple of examples.

  1. You receive data on a Windows machine via a socket. Windows employs a little-endian architecture while network data is "supposed" to be in big-endian format.
  2. You process a data file that was created on a system with a different "endianness."

In either of these cases, you'll need to byte-swap all numbers that are bigger than 1 byte, e.g., shorts, ints, longs, doubles, etc. However, if you are always dealing with data from the same platform, endian issues are of no concern.

b) Based on your question, it sounds like you have a char pointer and want to extract the first 4 bytes as an int and then deal with any endian issues. To do the extraction, use this:

int n = *(reinterpret_cast<int *>(myArray)); // where myArray is your data

Obviously, this assumes myArray is not a null pointer; otherwise, this will crash since it dereferences the pointer, so employ a good defensive programming scheme.

To swap the bytes on Windows, you can use the ntohs()/ntohl() and/or htons()/htonl() functions defined in winsock2.h. Or you can write some simple routines to do this in C++, for example:

inline unsigned short swap_16bit(unsigned short us)
{
    return (unsigned short)(((us & 0xFF00) >> 8) |
                            ((us & 0x00FF) << 8));
}

inline unsigned long swap_32bit(unsigned long ul)
{
    return (unsigned long)(((ul & 0xFF000000) >> 24) |
                           ((ul & 0x00FF0000) >>  8) |
                           ((ul & 0x0000FF00) <<  8) |
                           ((ul & 0x000000FF) << 24));
}
Matt Davis
u should mention that the first code snippet has the same problem like Daniels': it can access unaligned data that's not suitable for int*
Johannes Schaub - litb
+2  A: 

The easy way to solve this is to make sure whatever generates the bytes does so in a consistent endianness. Typically the "network byte order" used by various TCP/IP stuff is best: the library routines htonl and ntohl work very well with this, and they are usually fairly well optimized.

However, if network byte order is not being used, you may need to do things in other ways. You need to know two things: the size of an integer, and the byte order. Once you know that, you know how many bytes to extract and in which order to put them together into an int.

Some example code that assumes sizeof(int) is the right number of bytes:

#include <limits.h>

int bytes_to_int_big_endian(const char *bytes)
{
    int i;
    int result;

    result = 0;
    for (i = 0; i < sizeof(int); ++i)
        result = (result << CHAR_BIT) + bytes[i];
    return result;
}

int bytes_to_int_little_endian(const char *bytes)
{
    int i;
    int result;

    result = 0;
    for (i = 0; i < sizeof(int); ++i)
        result += bytes[i] << (i * CHAR_BIT);
    return result;
}


#ifdef TEST

#include <stdio.h>

int main(void)
{
    const int correct = 0x01020304;
    const char little[] = "\x04\x03\x02\x01";
    const char big[] = "\x01\x02\x03\x04";

    printf("correct: %0x\n", correct);
    printf("from big-endian: %0x\n", bytes_to_int_big_endian(big));
    printf("from-little-endian: %0x\n", bytes_to_int_little_endian(little));
    return 0;
}

#endif
Lars Wirzenius
Now replace "int" with "unsigned" and your answer is correct ;)
bothie
i would replace the + and += with | and |= respectively. it's confusing to use math operators here imho.
Johannes Schaub - litb
+1  A: 

Why read when you can just compare?

bool AreEqual(int i, char *data)
{
   return memcmp(&i, data, sizeof(int)) == 0;
}

If you are worrying about endianness when you need to convert all of integers to some invariant form. htonl and ntohl are good examples.

Dmitriy Matveev
This will always return false. I think you mean memcmp(), not memcpy().
Matt Davis
Thank you, fixed.
Dmitriy Matveev
+1  A: 

How about

int int_from_bytes(const char * bytes, _Bool reverse)
{
    if(!reverse)
        return *(int *)(void *)bytes;

    char tmp[sizeof(int)];

    for(size_t i = sizeof(tmp); i--; ++bytes)
        tmp[i] = *bytes;

    return *(int *)(void *)tmp;
}

You'd use it like this:

int i = int_from_bytes(bytes, SYSTEM_ENDIANNESS != ARRAY_ENDIANNESS);


If you're on a system where casting void * to int * may result in alignment conflicts, you can use

int int_from_bytes(const char * bytes, _Bool reverse)
{
    int tmp;

    if(reverse)
    {
        for(size_t i = sizeof(tmp); i--; ++bytes)
            ((char *)&tmp)[i] = *bytes;
    }
    else memcpy(&tmp, bytes, sizeof(tmp));

    return tmp;
}
Christoph