In General
Each type has advantages and disadvantages, and specifically there are scenarios where each one will have the highest performance.
Addressable types (byte, char, short, int, and on x86-64 "long int") can all be loaded from memory in a single operation and so they have the least CPU overhead on a per-operation basis.
But, bit fields or flags packed into one or more bits might result in an overall faster program because:
- they use the cache more efficiently, and this is a huge win, easily paying for a few extra cpu ops needed to unpack each item
- they require fewer I/O operations to read in from disk, and this additional huge win easily pays for more CPU ops, even tho once again the cpu ops must be paid per item
Processor speeds have been advancing faster than disk and network speeds for decades, and now individual CPU ops are rarely a concern, particularly in your C/C++ case. You are already using the fastest code generator in the arsenal.
The in-RAM/not-in-cache scenario you mentioned
As it happens there is a still a cache factor to consider. Because the CPU is so fast, it is likely that execution time will be dominated by DRAM access on cache loads. If this is true, there is still an advantage to packing the data but it is dimished somewhat for a linear scan through the table. As it happens, modern DRAM is far more efficiently read in order, so you can fill an entire cache block in not much more time than is required to randomly read a single address. If execution time is dominated by an in-order traversal of the data structure, this works in your favor and would tend to flatten the performance difference between using addressable units and packed data structures.
Worry about important things
Finally, it's probably obvious but I will say it anyway: the data structure in terms of maps like hashes and trees, and the choice of algorithm typically has much more influence than machine ops tuning, which gives only an essentially linear optimization.
Worrying about memory bloat does matter, and it matters a lot if there is any possibility that your app won't fit in memory. Virtual storage turned out to be really important for protection and OS-kernel-level memory management, but one thing it never managed to do was allow programs to grow bigger than available RAM without bogging everything down.