ansaurus

Question

Reading "integer" size bytes from a char* array.

Answer 1

+1 A:

You shouldn't need to worry about endianess unless you are reading the bytes from a source created on a different machine, e.g. a network stream.

Given that, can't you just use a for loop?

void ReadBytes(char * stream) {
    for (int i = 0; i < sizeof(int); i++) {
        char foo = stream[i];
        }
    }
 }

Are you asking for something more complicated than that?

Steve Rowe 2009-02-13 06:38:52

My data is actually created from a different source

kal 2009-02-13 06:55:37

Answer 2

A:

You need to worry about endianess only if the data you're reading is composed of numbers which are larger than one byte.
if you're reading sizeof(int) bytes and expect to interpret them as an int then endianess makes a difference. essentially endianness is the way in which a machine interprets a series of more than 1 bytes into a numerical value.

shoosh 2009-02-13 06:43:47

Answer 3

+1 A:

Depends on how you want to read them, I get the feeling you want to cast 4 bytes into an integer, doing so over network streamed data will usually end up in something like this:

int foo = *(int*)(stream+offset_in_stream);

Daniel 2009-02-13 06:45:33

That may result in an unaligned access.

gimpf 2009-02-13 06:56:53

@gimpf: I'm curious: on which systems will this actually lead to an error?

Christoph 2009-02-13 11:15:45

I.e. on 80486 and any better CPU with the Align-Flag set.

bothie 2009-02-13 15:48:29

When would the alignment flag be set?

Rob Kennedy 2009-02-14 00:42:33

on my professors cpu it would cause a bus error. on sun processors (i believe sparcs), this can fail too. basically any processor that does not support unaligned reads/writes

Johannes Schaub - litb 2009-02-14 07:08:00

Answer 4

+3 A:

Do you mean something like that?:

char* a;
int i;
memcpy(&i, a, sizeof(i));

You only have to worry about endianess if the source of the data is from a different platform, like a device.

Dani van der Meer 2009-02-13 06:47:26

Answer 5

A:

Just use a for loop that moves over the array in sizeof(int) chunks.
Use the function ntohl (found in the header <arpa/inet.h>, at least on Linux) to convert from bytes in the network order (network order is defined as big-endian) to local byte-order. That library function is implemented to perform the correct network-to-host conversion for whatever processor you're running on.

Chris Connett 2009-02-13 06:48:50

Of course, this applies only if you're actually reading something from the network...

gimpf 2009-02-13 06:57:54

Ok, he stated in the _comment_ that he is reading it from a different machine. Well, maybe done by burning/reading a CD, but more probably he indeed meant some kind of network.

gimpf 2009-02-13 06:59:52

Answer 6

+4 A:

a) You only need to worry about "endianness" (i.e., byte-swapping) if the data was created on a big-endian machine and is being processed on a little-endian machine, or vice versa. There are many ways this can occur, but here are a couple of examples.

You receive data on a Windows machine via a socket. Windows employs a little-endian architecture while network data is "supposed" to be in big-endian format.
You process a data file that was created on a system with a different "endianness."

In either of these cases, you'll need to byte-swap all numbers that are bigger than 1 byte, e.g., shorts, ints, longs, doubles, etc. However, if you are always dealing with data from the same platform, endian issues are of no concern.

b) Based on your question, it sounds like you have a char pointer and want to extract the first 4 bytes as an int and then deal with any endian issues. To do the extraction, use this:

int n = *(reinterpret_cast<int *>(myArray)); // where myArray is your data

Obviously, this assumes myArray is not a null pointer; otherwise, this will crash since it dereferences the pointer, so employ a good defensive programming scheme.

To swap the bytes on Windows, you can use the ntohs()/ntohl() and/or htons()/htonl() functions defined in winsock2.h. Or you can write some simple routines to do this in C++, for example:

inline unsigned short swap_16bit(unsigned short us)
{
    return (unsigned short)(((us & 0xFF00) >> 8) |
                            ((us & 0x00FF) << 8));
}

inline unsigned long swap_32bit(unsigned long ul)
{
    return (unsigned long)(((ul & 0xFF000000) >> 24) |
                           ((ul & 0x00FF0000) >>  8) |
                           ((ul & 0x0000FF00) <<  8) |
                           ((ul & 0x000000FF) << 24));
}

Matt Davis 2009-02-13 07:10:12

u should mention that the first code snippet has the same problem like Daniels': it can access unaligned data that's not suitable for int*

Johannes Schaub - litb 2009-02-13 07:46:07

Answer 7

+2 A:

The easy way to solve this is to make sure whatever generates the bytes does so in a consistent endianness. Typically the "network byte order" used by various TCP/IP stuff is best: the library routines htonl and ntohl work very well with this, and they are usually fairly well optimized.

However, if network byte order is not being used, you may need to do things in other ways. You need to know two things: the size of an integer, and the byte order. Once you know that, you know how many bytes to extract and in which order to put them together into an int.

Some example code that assumes sizeof(int) is the right number of bytes:

#include <limits.h>

int bytes_to_int_big_endian(const char *bytes)
{
    int i;
    int result;

    result = 0;
    for (i = 0; i < sizeof(int); ++i)
        result = (result << CHAR_BIT) + bytes[i];
    return result;
}

int bytes_to_int_little_endian(const char *bytes)
{
    int i;
    int result;

    result = 0;
    for (i = 0; i < sizeof(int); ++i)
        result += bytes[i] << (i * CHAR_BIT);
    return result;
}


#ifdef TEST

#include <stdio.h>

int main(void)
{
    const int correct = 0x01020304;
    const char little[] = "\x04\x03\x02\x01";
    const char big[] = "\x01\x02\x03\x04";

    printf("correct: %0x\n", correct);
    printf("from big-endian: %0x\n", bytes_to_int_big_endian(big));
    printf("from-little-endian: %0x\n", bytes_to_int_little_endian(little));
    return 0;
}

#endif

Lars Wirzenius 2009-02-13 07:10:59

Now replace "int" with "unsigned" and your answer is correct ;)

bothie 2009-02-13 15:47:10

i would replace the + and += with | and |= respectively. it's confusing to use math operators here imho.

Johannes Schaub - litb 2009-02-14 07:08:54

Answer 8

+1 A:

Why read when you can just compare?

bool AreEqual(int i, char *data)
{
   return memcmp(&i, data, sizeof(int)) == 0;
}

If you are worrying about endianness when you need to convert all of integers to some invariant form. htonl and ntohl are good examples.

Dmitriy Matveev 2009-02-13 07:20:44

This will always return false. I think you mean memcmp(), not memcpy().

Matt Davis 2009-02-13 07:26:09

Thank you, fixed.

Dmitriy Matveev 2009-02-13 08:35:19

Answer 9

+1 A:

How about

int int_from_bytes(const char * bytes, _Bool reverse)
{
    if(!reverse)
        return *(int *)(void *)bytes;

    char tmp[sizeof(int)];

    for(size_t i = sizeof(tmp); i--; ++bytes)
        tmp[i] = *bytes;

    return *(int *)(void *)tmp;
}

You'd use it like this:

int i = int_from_bytes(bytes, SYSTEM_ENDIANNESS != ARRAY_ENDIANNESS);

If you're on a system where casting void * to int * may result in alignment conflicts, you can use

int int_from_bytes(const char * bytes, _Bool reverse)
{
    int tmp;

    if(reverse)
    {
        for(size_t i = sizeof(tmp); i--; ++bytes)
            ((char *)&tmp)[i] = *bytes;
    }
    else memcpy(&tmp, bytes, sizeof(tmp));

    return tmp;
}

Christoph 2009-02-13 11:08:31

ansaurus

tags:

views:

answers:

Reading "integer" size bytes from a char* array.

related questions