tags:

views:

203

answers:

8

int, char and bool usually have different sizes. Where int>char>bool, I suppose.

  • But does the RAM even support this?
  • How is it built up?
  • Can it take advantage of bool being only 1 byte and store it in a small "register"?
+1  A: 

Presumably you mean cache? Just curious why you're worried about the sizes of data structures, are you programming for embedded? That's usually the only time memory footprint is worth worrying about.

If you have several bit fields that you want to maintain concurrently you can use a byte as a bitfield and remember that values like

0x0001 
0x0010 
0x0100 
0x1000 

are each separate from each other and can be checked for independently of the others. People do this all the time to try and save a bit of space. Is that sort of what you're trying to figure out?

So for instance, if each bool takes up one byte of space, then obviously only one bit per byte is being used. So if you chain 8 bits together, it will only consume one byte of space.

But don't forget each variable in memory also has some sort of marshalling to it (more evident in .NET than in "lower" level languages, but there's always something tracking the variables in use). So like in the case of C# a single byte actually needs like 3 bytes of RAM.

But RAM is transferred in by the block, which is much larger as I understand it than a single byte. Usually that's measured in at least words, and the normal size is either 32, 64, or 128 bits at a time. Those numbers are platform dependent.

drachenstern
Would hex numbers in the power of 2 achieve the same thing? i.e.: `0x0001`, `0x0002`, `0x0004`, `0x0008`, `0x0010`, etc.
Xavier Ho
Since hex numbers are shorthand for binary then yes. It's all about the binary location. Also, 0x0003 would be shorthand for both the first and second position being true/set (as it were).
drachenstern
Right. Another question is, wouldn't two hex digits make one byte? Your example uses half a byte for one field. Not this matters that much. `=]`
Xavier Ho
Oh, that's cause it was a contrived example meant to show the basic concept. Truthfully a byte is platform dependent on size. We "assume" that it's 8 bytes in size, but we don't really have a firm rule for all platforms. I just wanted to show how they lined up.
drachenstern
Huh? Crazy... I s'ppose I'll keep an eye out for such rarity.
Xavier Ho
+1  A: 

If by 'support' you mean does the RAM in a machine have a native storage unit matching each size, the answer is 'it depends on the machine and the compiler'.

Modern machines typically have minimum addressable storage sizes that are multiples of 8-bits (8/16/32/64 bits). Compilers can use any of those sizes to store and manipulate data. A compiler may optimize storage and register usage, but they do not have to.

Benjamin Franz
A: 

What does that have to do with RAM?

A bool can be true or false, which is usually represented as 0 or 1 (1 bit). A char can have different sizes, depending on the charset used. ASCII uses 7 bits. Unicode uses up to 32 bits. Integers are whole numbers, often supporting the range of -2^31....2^31-1 (32 bits), but they also come in other sizes.

Lucero
In memory a bool usually takes up an entire byte. As it's much easier for the processor and memory manager to deal with a byte than a single bit.
Matt Greer
Nowadays booleans often even "use" 32 or even 64 bits, but that's just padding for more efficiency. At the same time you should not forget about "flags" which are nothing but a bunch of booleans as well. Still the boolean value itself uses only one bit. The memory layout is subject to the platform and compiler.
Lucero
A: 

You can use C++ bit fields if you like, but you will be one of the few on this planet who do it (technically, bitfields are well-defined in C, but they were never really used much)

How RAM is accessed is hidden to you by the C++ compiler for good reasons. There are cases where you want to optimize this, but they are extremely rare. In today's world of massive RAM amounts in client PCs it's just not worth the amount of micro-optimization.

Generally, you should trust your (optimizing) compiler to do the right thing. The source code you deliver to the compiler only vaguely resembles the machine code produced by the compiler. It's a myth that micro-optimizations help much if your compiler is good. You have to know exactly where the compiler needs help in its optimization process to optimize better than the compiler. You can even make matters worse if the compiler decides your code is too complicated to optimize.

If you want some technical background:

On machine language level it depends on the processor. For example the Motorola 680x0 line of processors allows you to do

move.l
move.w
move.b

to read and write different "units" of RAM (long/word/byte). The processor looks differently on its RAM depending on what instruction it processes. Some embedded processors may even use 4 bits as their smallest unit.

Thorsten79
+1  A: 

RAM does not really care about data type sizes. It just stores data in bytes. The CPU controls the basic data types, knowing how many bytes they are. When creating an int, for example, the CPU decides to use for example 4 or 8 bytes (32 or 64 bit architecture respectively)

One bit cannot be addressed, but you can make a custom structure where you store 8 booleans in one byte. In C++, you can utilize this using bit fields.

Johan
Also, STL's `bitset` is useful for handling space-efficient arrays of boolean values.
mxp
+5  A: 

Computer memory is organized into "words", a sequence of bytes of a given size (often a 2-power). Memory is usually read and written in these units which are often compatible with the size of the registers and the CPU's native support for arithmetic operators. This is typically the source of the "bit rating" of a machine (e.g., a 32 bit CPU, a 64 bit CPU, the old 8-bit video game consoles).

Of course, you often need a different size from the native word size. Machine instructions and smart coding allows you to break these words into smaller units by applying various bit-level logical operators, or to combine them into larger units by "combining" multiple words.

For instance, if you have a 32 bit word, you could AND a word against a pattern like 0xff0000ff to get the first and last byte in that word, or 0x0000ffff to get just the contents of the second 16-bit int.

In the case of bools, it is common to use memory as a bitmap. You can essentially place X "bools" in an X-bit word and access a specific bit by ANDing or ORing against a "mask" that refers to that bool. E.g., 1 for the first bit, 2 for the second bit, 4 for the fourth bit, etc.

In most machines, it is inadvisable to split a smaller data type across two words (this is called alighment).

When you work with a higher level language like C or C++, you usually don't have to worry about all this memory organization stuff. If you allocate an int, a short, and a double, the compiler will generate the appropriate machine code. You only do this directly when you want to smartly organize things in dynamically allocated memory, for example when manually implementing a bitmap.

When working with larger units than the native word size, the compiler will again handle most things for you. For instance, on a 32-bit machine you can easily handle 32-bit int operations, but to run the same code on an 8-bit machine or a 16-bit machine the compiler would generate code to do the smaller operations and combine them to get the results. This is partially why it is generally considered advisable to run a 64-bit OS on a 64-bit machine, since otherwise you might be performing multiple instructions and read/writes to simulate 64-bit on a 32-bit OS rather than a single instruction or memory access.

Uri
+5  A: 

On a normal, modern computer all memory is byte addressable. That is each byte-sized storage location in RAM has a unique number assigned to it. If you want to store a one-byte value such as a bool (although bool s are not required to be one byte in C++, they just usually are), it takes a single storage location, say location 42.

If you want to store something larger than one byte, say an int, then it will take multiple consecutive storage locations. For example, if your int type is 16 bits (2 bytes) long, then half of it will be stored in location 42 and the other half in location 43. This generalizes to larger types. Say you have a 64-bit (8-byte) long long int type. A value of this type might be stored across locations 42, 43, 44, 45, 46, 47, 48, and 49.

There are some more advanced considerations called "alignment" that some sorts of processors need to have respected. For example, a processor might have a rule that a two-byte value must begin on an even address, or that a four-byte value must begin on an address that is divisible by 4. Your compiler will take care of the details of this for you.

The compiler also knows how long each type is, so when it generates the machine code for your program, it will know at which address the storage for each variable begins, and it will know how many consecutive bytes the variable is stored in.

"Registers" on the other hand, are something that exist in the processor, not in RAM, and are usually a fixed size. One use of processor registers is to store a value retrieved from RAM. For example, if your processor has 32 bit (4 byte) registers, then a bool value loaded from RAM will still consume an entire 4-byte register, even though it consumed only one byte when it was in RAM.

Tyler McHenry
This is actually not true, but the CPU makes it look like that. On a normal modern computer, RAM is organized in DIMM modules. You can't address a single byte of a DIMM, only 8 bytes/64 bits.
MSalters
Well, it depends on what level you're speaking. If you're talking about what sorts of addresses the processor accepts (which is the lowest level that a programmer would normally be concerned with), then it is true that memory is byte-addressable, even though at a lower hardware level, that may not be the case.
Tyler McHenry
A: 

Most commodity hardware has byte-addressable memory. Looking a bit deeper we see that CPU registers are given bit-width (32 or 64 bits for everyday stuff). Then caches and buses operate on blocks of these (64 or 128 something bytes.) You can try taking advantage of this but you'd need pretty detailed understanding of the hardware, and you'd be binding yourself to a particular platform. On the other hand, you don't have to take advantage of this since your compiler already does.

Nikolai N Fetissov