views:

478

answers:

13

Let's say that you were to have a structure similar to the following:

struct Person {
  int  gender;         // betwwen 0-1
  int  age;            // between 0-200
  int  birthmonth;     // between 0-11
  int  birthday;       // between 1-31
  int  birthdayofweek; // between 0-6
}

In terms of performance, which would be the best data type to store each of the fields? (e.g. bitfield, int, char, etc.)

It will be used on an x86 processor and stored entirely in RAM. A fairly large number will need to be stored (50,000+), so processor caches and such will need to be taken into account.

Edit: OK, let me rephrase the question. If memory usage is not important, and the entire dataset will not fit into the cache no matter which datatypes are used, is it generally better to use smaller datatypes to fit more of the data into the CPU cache, or is it better to use larger datatypes to allow the CPU to perform faster operations? I am asking this for reference only, so code readability and such should not be considered.

+13  A: 
  1. Don't worry about it; use what is semantically correct, and most readable
  2. There is no answer that is correct all the time. It depends on the platform and compiler. If you really care, then you have to test.

In general, I would stay stick with ints... except for gender which should probably be an enum.

Chris Arguin
I am fairly sure you know this, but just for record: `enum` is also `int`. I agree with your recommendation about `enum` though :-).
Alok
In C++ an enum has its *own* type traits, and implicitly casts to int in an int context.
greyfade
wouldn't you rather use *unsigned* ints? It's not like age/month/.. can be negative.
stijn
@greyfade: thanks for the correction!
Alok
@stijn: Does it matter? It's not like age/month/... are likely to go over 2^31 either.
jalf
true, to me it just seems illogical that with signed you have the possibility to get negative while you do not with unsigned. still more likely to get really old than to get, erm, reborn? anyway, in the end it doesn't really matter, thats true.
stijn
@stijn: It does matter: finding the difference between two ages or days is simpler with signed. Using unsigned to prevent nonsense is pointless: it's already nonsense.
Roger Pate
+1  A: 

Int is the fastest.

If you're using it in an array, you'll waste more memory however, so you may want to stick with byte in that case.

orokusaki
+3  A: 

it depends. Are you running out of memory? Then memory efficiency becomes paramount. Is it taking too long? Then CPU time, or at least perceived user response time, becomes paramount.

GregS
+3  A: 

It depends. Many processors can directly access words, but require additional instructions to access octets or bits. Depending on the the compiler and processor, int might be the same as the addressable word. But is speed really going to matter? Readability and maintainability is likely to be more important.

M. S. B.
You're talking about memory alignment, right? AFAIK, on Intel IA-32 (x86) processors, access to memory locations that aren't `int`-aligned (ie. addresses whose lowest two bits are not `00`) is penalized. Thus, once you introduce a field smaller that `int`, your fields' alignment can end up not being ideal for fast access. `char` fields, for example, should then be "packed" together, so that the next `int` field ends up on an aligned location (measured from the `struct`'s start). (I believe a C++ compiler isn't allowed to shuffle `struct` fields around to do automatically align fields.)
stakx
+1  A: 

What Chris said. If this is a hypothetical program you're designing, trying to pick int versus uint8 at this stage isn't going to help you one bit in the long run. Focus your effort elsewhere.

If, when it comes down to it, you have a giant complex system you've made several rounds of optimizations on, and you're curious what the effect is, switching int to uint8 is probably (should be anyway) pretty easily anyway. At that stage you can make a statistically valid comparison in a real-world use case - not before.

Tom Ritter
+5  A: 

int_fast#_t from <stdint.h> or <boost/cstdint.hpp>.

That said, you'll give up simplicity and consistency (these types may be a character type, for example, which are integer types in C/C++, and that might lead to surprising function resolutions) instead of just using an int.

You'll see much more significant performance benefits by concentrating on other areas, like algorithmic complexity and access patterns.

It will be used on an x86 processor and stored entirely in RAM. A fairly large number will need to be stored (50,000+), so processor caches and such will need to be taken into account.

You still have to worry about cache (after you're at that level of optimization), even if the whole data won't be cached. For example, do you access every item in sequence? unpredictable? or just one field from every item in sequence? Compare struct { int a, b; } data[N]; to int data_a[N], data_b[N];. (Imagine you need all the 'a' at once, but can ignore the other, which way is more cache friendly?) Again, this doesn't sound like the main area on which you should focus.

Roger Pate
+1  A: 
enum Gender { Female, Male };

struct Date {...};

class Person
{
    Gender gender;
    Date date_of_birth;
    //no age field - can be computed from date_of_birth
};
UncleBens
make age a function ;)
Mark
+4  A: 

Amount of bits used: gender; 1/2 (2 if you want to include intersexuality :)) age; 8 (0-255) birthmonth; 4 (16) birthday; 5 (32) birthdayofweek; 3 (8)

Bits at all: less than 22.

Knowing that it is running on x86 we have the int datatype with 32 bits. So build your own routines which can accept an int

read(gender, int* pValue); write(gender, int* pValue);

by using shift and bit mask operators to store and retrieve the information. For the fields you can use typesafe enums.

That is very fast and has an extremely low memory footprint.

Thorsten S.
A: 

There are many reasons why either version could be faster.

Packing the structure by making each member into a minimal number of bits means less memory traffic and cache usage, which will cause a speedup. It also means more work to extract and access/modify individual fields, which will cost you performance. It will probably mean marginally more code as well, which will take up more of the instruction cache and so cost a bit of performance.

It's really impossible to say for sure which factor will be dominant. In most cases, it will be most efficient to use ints and bytes as needed. Don't bother with bitfields.

But in some cases, that won't be true. So measure, benchmark, profile. Find out how each implementation performs in your code.

jalf
+1  A: 

In General

Each type has advantages and disadvantages, and specifically there are scenarios where each one will have the highest performance.

  • Addressable types (byte, char, short, int, and on x86-64 "long int") can all be loaded from memory in a single operation and so they have the least CPU overhead on a per-operation basis.

  • But, bit fields or flags packed into one or more bits might result in an overall faster program because:

    • they use the cache more efficiently, and this is a huge win, easily paying for a few extra cpu ops needed to unpack each item
    • they require fewer I/O operations to read in from disk, and this additional huge win easily pays for more CPU ops, even tho once again the cpu ops must be paid per item

Processor speeds have been advancing faster than disk and network speeds for decades, and now individual CPU ops are rarely a concern, particularly in your C/C++ case. You are already using the fastest code generator in the arsenal.

The in-RAM/not-in-cache scenario you mentioned

As it happens there is a still a cache factor to consider. Because the CPU is so fast, it is likely that execution time will be dominated by DRAM access on cache loads. If this is true, there is still an advantage to packing the data but it is dimished somewhat for a linear scan through the table. As it happens, modern DRAM is far more efficiently read in order, so you can fill an entire cache block in not much more time than is required to randomly read a single address. If execution time is dominated by an in-order traversal of the data structure, this works in your favor and would tend to flatten the performance difference between using addressable units and packed data structures.

Worry about important things

Finally, it's probably obvious but I will say it anyway: the data structure in terms of maps like hashes and trees, and the choice of algorithm typically has much more influence than machine ops tuning, which gives only an essentially linear optimization.

Worrying about memory bloat does matter, and it matters a lot if there is any possibility that your app won't fit in memory. Virtual storage turned out to be really important for protection and OS-kernel-level memory management, but one thing it never managed to do was allow programs to grow bigger than available RAM without bogging everything down.

DigitalRoss
A: 

In most cases you should stick to the default word size supported by the CPU. In many situations most CPUs will pad or require the compiler to pad values that are smaller than the default word size (so that structures can be word aligned), so any memory usage gain from a smaller data type can be rendered moot.

The main exception to this where you may have a large array, for example loading a file into an array of byte[] is preferable to int[].

In general model the problem cleanly and do the simple obvious thing first, then profile to see if you have memory issues. Looking at the case supplied, it is worth noting a couple of points:

  • The field age is a de-normalisation of the model and is a function of birthmonth, birthday, birthdayofweek, which could be represented as a single long birthtimestamp. So, just by looking at the model you can trim 8 bytes easily.
  • 4 (bytes per int) * 5 (number of fields) * 50 000 (number of instances) = ~1MB. My laptop has 4MB L2 cache, you're unlikely to have memory issues.
Michael Barker
A: 

This depends on exactly one thing:

What is the ratio of blindly copying the data and actually reading the data?

If you treat the data mostly as cookies, and seldom have to access an actual record, use bit fields which consume fewer space and are more IO-friendly.

If you access it a lot, and seldom copy it (as a part of container reallocation) or serialize it to disk, use larger integers that are CPU-friendly.

Pavel Radzivilovsky
A: 

The processors word size is the smallest unit the computer can read/write to/from memory. If you read or write a bit, char, short or int the CPU, northbridge overhead is the same in all cases given reasonable system/compiler behavior. There is obviously a sram cache benefit to smaller field sizes. The one gotcha for x86 is to ensure proper alignment at word sized multiples. Other architectures such as sparc are much less forgiving and will actually crash if memory access is unaligned.

I tend to shy away from data types smaller than the computers word size in performant multi-threaded apps as changes to one field within a word shared with other fields in the same word can have unintended dependancies if access to all fields within the word are not externally synchronized.

Einstein