ansaurus

Question

Fastest Method to Split a 32 Bit number into Bytes in C++

Answer 1

+6 A:

This is probably as optimized as you'll get. Bit-twiddling operations are some of the fastest available on the processor.

It may be faster to >> 16, >> 24 instead of >>= 8 >>= 8 - you cut down an assignment.

Also I don't think you need the & - since you're casting to a BYTE (which should be a 8-bit char) it'll get truncated down appropriately anyway. (Is it? correct me if I'm wrong)

All in all, though, these are really minor changes. Profile it to see if it actually makes a difference :P

v3 2009-04-12 03:11:51

SoapBox 2009-04-12 03:49:38

This deserves a +1 as it's the only endian-independent approach, which is in general the best way to do things.

j_random_hacker 2009-04-12 14:12:26

Answer 2

A:

Do you have to do it one byte at a time? Is there a way you could just memcpy() the whole 32 bits into the stream in one fell swoop? If you have the address of the buffer you're writing to the stream, can you just copy into that?

Furious Coder 2009-04-12 03:16:44

Unfortunately yes. I put up the vector as sample code for demonstration ... but what I'm putting it through wants byte-by-byte representation.

Billy ONeal 2009-04-12 03:18:09

Answer 3

+5 A:

Just use a union:

assert(sizeof (DWORD) == sizeof (BYTE[4]));   // Sanity check

union either {
    DWORD dw;
    struct {
         BYTE b[4];
    } bytes;
};

either invalidLength;
invalidLength.dw = (DWORD) invalidClsids.length();
compressedBytes.push_back(either.bytes.b[0]);
compressedBytes.push_back(either.bytes.b[1]);
compressedBytes.push_back(either.bytes.b[2]);
compressedBytes.push_back(either.bytes.b[3]);

NOTE: Unlike the bit-shifting approach in the original question, this code produces endian-dependent output. This matters only if output from a program running on one computer will be read on a computer with different endianness -- but as there seems to be no measurable speed increase from using this method, you might as well use the more portable bit-shifting approach, just in case.

j_random_hacker 2009-04-12 03:21:06

The impression I've gotten is that its wrong to use unions as casts. Unions are (rarely) best used as ways to save on memory by not storing two large items in memory when you only ever use one at a time.

Ed 2009-04-12 03:23:29

Also, be sure you understand the endianess issues if this code might ever be ported...

Michael Burr 2009-04-12 03:26:58

Endianness doesn't really matter because even if it's written backwards, it will be read backwards as well.

Billy ONeal 2009-04-12 03:33:22

@Ed: Actually you have it round the wrong way: the C++ standard specifically allows the "reinterpretation" of the contents of a union (3.10.15) -- what *does* lead to undefined behaviour is casting a pointer to X to a pointer to Y (where X and Y are unrelated types) and then dereferencing it.

j_random_hacker 2009-04-12 03:34:12

@Ed: Casting an X* to a Y* where X and Y are unrelated types breaks C++'s aliasing assumptions, namely that a Y object can only be pointed to by pointers of static type Y* (or pointers to a base classes of Y). This assumption is important for generating optimised code.

j_random_hacker 2009-04-12 03:38:24

@Michael: Very good point about endianness. The code is non-portable in that respect -- to produce the same result as the OP's code, it needs to be run on a little-endian machine.

j_random_hacker 2009-04-12 03:42:13

@BillyONeal - but if it gets written on one system and read on another with different byte ordering it'll be broken. I'd at least throw a compile-time assert to test endianess so if someone decides a port is a good idea a couple years from now, the compiler will tell them about the problem.

Michael Burr 2009-04-12 03:45:01

@BillyONeal: Sort-of. The endianness-dependence *matters* only if the output is read on a machine with different endianness, which may be unlikely in this case (CLSIDs somewhat implies Windows, which runs only on little-endian CPUs these days).

j_random_hacker 2009-04-12 03:45:18

@j_random_hacker: This code is for windows boxes... little endian only anyway. I know one wants to strive for portable code, but the program in question is a windows utility which will make little sense outside of windows because it's reporting on windows' internals.... Thanks again!

Billy ONeal 2009-04-12 04:25:20

This isn't necessarily as fast as you might expect. Unions mean the compiler has to take aliasing into account, which may prevent a number of optimizations. Like Pax said, measure, instead of guessing.

jalf 2009-04-12 12:55:10

@jalf: I totally agree with "measure everything." But in this case I think that vector reallocation time is likely to dwarf any differences -- the focus should be on figuring out the number of BYTES and reserve()ing that ahead of time.

j_random_hacker 2009-04-12 14:13:38

@j_random_hacker For clarification, "wrong" was in terms of best practices and not in terms of the standard.

Ed 2009-04-13 13:28:00

@Ed: In your opinion, what is the best-practice way of doing this, and why is considered better than using unions? (I'm a bit disinclined to choose a non-standard approach over a standard-endorsed approach, but if you could tell me the reason why I might change my mind...)

j_random_hacker 2009-04-14 13:12:48

Answer 4

+1 A:

A real quick way is to just treat the a DWORD* (single element array) as a BYTE* (4 element array). The code is also a lot more readable.

Warning: I haven't compiled this

Warning: This makes your code dependent on byte ordering

std::vector<BYTE> compressedBytes;
DWORD invalidLength = (DWORD) invalidClsids.length();
BYTE* lengthParts = &invalidLength;
static const int kLenghtPartsLength = sizeof(DWORD) / sizeof(BYTE);
for(int i = 0; i < kLenghtPartsLength; ++i)
    compressedBytes.push_back(lengthParts[i]);

Ed 2009-04-12 03:21:27

Note: this is has the same result as the "unions" answer given by i_random_hacker.

SoapBox 2009-04-12 03:52:42

+1. Provided that BYTE is some sort of (possibly signed or unsigned) char type, the C++ standard guarantees undefined behaviour won't occur here -- though just as with unions, it is still *implementation-defined* behaviour (which, as you point out, effectively means byte-order dependence here).

j_random_hacker 2009-04-12 14:04:51

@SoapBox the result in assembly might be the same but as I mentioned in the j_random_hacker solution, I have a bias against using unions as casts.

Ed 2009-04-13 13:29:06

Answer 5

+2 A:

You should measure rather than guess at any potential improvement but my first thought is that it may be faster to do a union as follows:

typedef union {
    DWORD d;
    struct {
        BYTE b0;
        BYTE b1;
        BYTE b2;
        BYTE b3;
    } b;
} DWB;

std::vector<BYTE> compBytes;
DWB invLen;
invLen.d = (DWORD) invalidClsids.length();
compBytes.push_back(invalidLength.b.b3);
compBytes.push_back(invalidLength.b.b2);
compBytes.push_back(invalidLength.b.b1);
compBytes.push_back(invalidLength.b.b0);

That may be the right order for the pushbacks but check just in case - it depends on the endian-ness of the CPU.

paxdiablo 2009-04-12 03:27:30

This code has the potential to be better than the other union answer given, because you can reverse the other of b0-b3 using ifdefs and this code will then produce the same output on little and big endian machines.

SoapBox 2009-04-12 03:53:48

+1. Totally agree with "measure everything." In this case, my guess is that vector reallocations will utterly dwarf any difference between bit-shifts and unions -- so if you could figure out how many bytes compBytes will hold ahead of time and reserve() that much, that's where you might win big.

j_random_hacker 2009-04-12 13:59:17

@Soapbox: Cool idea! OTOH, formally speaking, a standard-conforming compiler may introduce arbitrary amounts of padding in between each BYTE in DWB.b, while it is not permitted to do so for an array of 4 BYTEs. But that's a minor quibble; in practice, every compiler has a mechanism to pack structs.

j_random_hacker 2009-04-12 14:10:04

Answer 6

+1 A:

compressedBytes.push_back(either.bytes.b[0]);
compressedBytes.push_back(either.bytes.b[1]);
compressedBytes.push_back(either.bytes.b[2]);
compressedBytes.push_back(either.bytes.b[3]);

There is an even smarter and faster way! Let's see what this code is doing and how we can improve it.

This code is serializing the integer, one byte at a time. For each byte it's calling push_back, which is checking the free space in the internal vector buffer. If we have no room for another byte, memory reallocation will happen (hint, slow!). Granted, the reallocation will not happen frequently (reallocations typically happen by doubling the existing buffer). Then, the new byte is copied and the internal size is increased by one.

vector<> has a requirement by the standard which dictates that the internal buffer be contiguous. vector<> also happen to have an operator& () and operator[] ().

So, here is the best code you can come up with:

std::string invalidClsids("This is a test string");
std::vector<BYTE> compressedBytes;
DWORD invalidLength = (DWORD) invalidClsids.length();
compressedBytes.resize(sizeof(DWORD)); // You probably want to make this much larger, to avoid resizing later.
// compressedBytes is as large as the length we want to serialize.
BYTE* p = &compressedBytes[0]; // This is valid code and designed by the standard for such cases. p points to a buffer that is at least as large as a DWORD.
*((DWORD*)p) = invalidLength;  // Copy all bytes in one go!

The above cast can be done in one go with the &compressedBytes[0] statement, but it won't be faster. This is more readable.

NOTE! Serializing this way (or even with the UNION method) is endian-dependent. That is, on an Intel/AMD processor the least significant byte will come first, while one a big-endian machine (PowerPC, Motorola...) the most significant byte will come first. If you want to be neutral, you must use a math method (shifts).

Ash 2009-04-12 05:51:19

+1. Yes, believe it or not this isn't any more "non-standard" than loading the bytes from a union one at a time, and almost certainly faster (but again: measure everything).

j_random_hacker 2009-04-12 14:07:37

ansaurus

tags:

views:

answers:

Fastest Method to Split a 32 Bit number into Bytes in C++

related questions