views:

218

answers:

3

I'm dealing with a problem which needs to work with a lot of data. Currently its values are represented as an unsigned int. I know that real values do not exceed a limit of 1000.

Questions

  1. I can use unsigned short to store it. An upside to this is that it'll use less storage space to store the value. Will performance suffer?

  2. If I decided to store data as short but all the calling functions use int, it's recognized that I need to convert between these datatypes when storing or extracting values. Will performance suffer? Will the loss in performance be dramatic?

  3. If I decided to not use short but just 10 bits packed into an array of unsigned int. What will happen in this case comparing with previous ones?

+10  A: 

This all depends on architecture. Bit-fields are generally slower, but if you are able to to significantly cut down memory usage with them, you can even gain in performance due to better CPU caching and similar things. Likewise with short (though it is not dramatic in any case).

The best way is to make your source code able to switch representation easily (at compilation time, of course). Then you will be able to test and profile different implementations in your specific circumstances just by, say, changing one #define.

Also, don't forget about premature optimization rule. Make it work first. If it turns out to be slow/not fast enough, only then try to speed up.

doublep
+1 for warning against premature optimization
Cam
Yes, I know about premature optimization :) The main reason for switching to lower types is using less memory. So if the loss can be not very big it's worth a try.I'll be using it on x64 and x86 architecture. Compilers are MS VC 9 and ICC 11.
flashnik
Potentially reducing working set to 50% or less isn't what Knuth spoke of. All depends on the amount of values, though - bit arithmetics often enough increase the code size more than the data size decreases.
peterchen
In my case typical data is about 1GB, so I hope code won't be large enough to make gain in working set meaningless:)
flashnik
+2  A: 

I can use unsigned short to store it.

Yes you can use unsigned short (assuming (sizeof(unsigned short) * CHAR_BITS) >= 10)

An upside to this is that it'll use less storage space to store the value.

Less than what? Less than int? Depends what is the sizeof(int) on your system?

Will performance suffer?

Depends. The type int is supposed to be the most efficient integer type for your system so potentially using short may affect your performance. Whether it does will depend on the system. Time it and find out.

If I decided to store data as short but all the calling functions use int, it's recognized that I need to convert between these datatypes when storing or extracting values.

Yes. But the compiler will do the conversion automatically. One thing you need to watch though is conversion between signed and unsigned types. If the value does not fit the exact result may be implementation defined.

Will performance suffer?

Maybe. if sizeof(unsigned int) == sizeof(unsigned short) then probably not. Time it and see.

Will the loss in performance be dramatic?

Time it and see.

If I decided to not use short but just 10 bits packed into an array of unsigned int. What will happen in this case comparing with previous ones?

Time it and see.

Martin York
Just a reminder: A short has to have at least 16 bits.
GMan
http://stackoverflow.com/questions/271076/what-is-the-difference-between-an-int-and-a-long-in-c/271132#271132
Martin York
+2  A: 

A good compromise for you is probably packing three values into a 32 bit int (with two bits unused). Untangling 10 bits from a bit array is a lot more expensive, and doesn't save much space. You can either use bit fields, or do it by hand yourself:

(i&0x3FF) // Get i[0]
(i>>10)&0x3FF // Get i[1]
(i>>20)&0x3FF // Get i[2]
i = (i&0x3FFFFC00) | (j&0x3FF)  // Set i[0] to j
i = (i&0x3FF003FF) | ((j&0x3FF)<<10)  // Set i[1] to j
i = (i&0xFFFFF)    | ((j&0x3FF)<<20)  // Set i[2] to j

You can see here how much extra expense it is: a bit operation and 2/3 of a shift (on average) for get, and three bit operations and 2/3 of a shift (on average) to set. Probably not too bad, especially if you're mostly getting the values not setting them.

Rex Kerr
@Rex, thank you for your answer. It is a very good point.
flashnik