views:

132

answers:

4

I have a structure called log that has 13 chars in it. after doing a sizeof(log) I see that the size is not 13 but 16. I can use the __attribute__((packed)) to get it to the actual size of 13 but I wonder if this will affect the performance of the program. It is a structure that is used quite frequently.

I would like to be able to read the size of the structure (13 not 16). I could use a macro, but if this structure is ever changed ie fields added or removed, I would like the new size to be updated without changing a macro because I think this is error prone. Have any suggestion?

+6  A: 

Yes, it will affect the performance of the program. Adding the padding means the compiler can use integer load instructions to read things from memory. Without the padding, the compiler must load things separately and do bit shifting to get the entire value. (Even if it's x86 and this is done by the hardware, it still has to be done).

Consider this: Why would compilers insert random, unused space if it was not for performance reasons?

Billy ONeal
Most hardware handles most unaligned loads without a penalty. The exception to the rule is when the access straddles some kind of boundary: cache line, page, etc. Mentioning instructions is misleading. In particular, if the working set does not fit into the cache (not an unusual situation), the benefit of fewer DRAM transactions for a "compressed" array will probably outweigh the extra cache accesses. Doubly so for structures written to disk.
Potatoswatter
@Potatoswatter: "most"? Maybe if "most machines are x86" your statement has some chance of being true, but last I checked most machines are embedded systems, cell phones, etc.. On most hardware, unaligned access means the compiler must generate code that performs the loads/stores byte-by-byte, possibly with bitshifting and bitwise or to assemble values, to work with larger types. This is a huge penalty.
R..
@R: Pre-ARMv6 ARM doesn't support misalignment, according to Wikipedia. Aside from that, SPARC, and DSPs, most architectures do support it. Anyway, even tedious byte flipping done at CPU speed might not be slower than extra disk/flash/DRAM transfer time.
Potatoswatter
@Potatoswatter, There are more DSPs and CPU cores with sizable penalties for unaligned access out there than you want to believe. Note that SSE2 on x86 requires alignment as well. This is really one of those areas where it is *much* better to leave the default behavior alone, unless you have a very good reason. Even then, test and benchmark to be sure.
RBerteig
@Potatoswatter: Why would the compiler insert alignment padding if it was **not** for performance reasons?
Billy ONeal
@Billy: See my answer.
Potatoswatter
@Photoswatter: Err... your answer revolves around performance reasons. (It's a good answer, I +1'd it) but I fail to see how it answers my question.
Billy ONeal
@Billy: Then I don't understand your question. There is a performance tradeoff and I never suggested otherwise.
Potatoswatter
@Potatoswatter: Ah -- I thought you responding to my comment asking for why a compiler would do that if it was **not** for performance reasons.
Billy ONeal
+3  A: 

Yes, it can affect the performance. In this case, if you allocate an array of such structures with the ((packed)) attribute, most of them must end up unaligned (whereas if you use the default packing, they can all be aligned on 16 byte boundaries). Copying such structures around can be faster if they are aligned.

caf
+2  A: 

Yes, it can affect performance. How depends on what it is and how you use it.

An unaligned variable can possibly straddle two cache lines. For example, if you have 64-byte cache lines, and you read a 4-byte variable from an array of 13-byte structures, there is a 3 in 64 (4.6%) chance that it will be spread across two lines. The penalty of an extra cache access is pretty small. If everything your program did was pound on that one variable, 4.6% would be the upper bound of the performance hit. If logging represents 20% of the program's workload, and reading/writing to the that structure is 50% of logging, then you're already at a small fraction of a percent.

On the other hand, presuming that the log needs to be saved, shrinking each record by 3 bytes is saving you 19%, which translates to a lot of memory or disk space. Main memory and especially the disk are slow, so you will probably be better off packing the log to reduce its size.


As for reading the size of the structure without worrying about the structure changing, use sizeof. However you like to do numerical constants, be it const int, enum, or #define, just add sizeof.

Potatoswatter
Creating saves by fwrite-ing structures is probably not a good idea in the first place -- Moving to a different compiler or platform would make previous saves worthless.
Billy ONeal
@Billy: The argument also applies to slow DRAM ("main memory") not written to disk. Anyway, proper serialization simply requires converting to a standard endianness.
Potatoswatter
@Potatoswatter: And ensuring the sizes of the types in your struct cannot change.
Billy ONeal
+2  A: 

Don't use __attribute__((packed)). If your data structure is in-memory, allow it to occupy its natural size as determined by the compiler. If it's for reading/writing to/from disk, write serialization and deserialization functions; do not simply store cpu-native binary structures on disk. "Packed" structures really have no legitimate uses (or very few; see the comments on this answer for possible disagreeing viewpoints).

R..
There are other situations where you have to deal with bit-by-bit arranged data structures. For example, most SPI or I2C devices take bytes of data with a very specific structure. Given the choice between 20 or so bit shifting and masking operations, or a documented, packed data structure and well defined type punning, I'd take the latter.
detly
I would suggest that mapping structures onto hardware registers is a legitimate use on embedded systems for example.
John Burton
I would group these usages with writes to disk, as "serialization". It's questionable whether the compiler with `__attribute__((packed))` would generate better code than you could do by hand with macros, and the latter would be portable (to other C implementations on the same hardware), but I'll grant that this is one place it might make sense to use such a compiler extension.
R..
@R - agreed that it sacrifices portability, but depending on the number of bytes and structure, I find it more readable.
detly