views:

795

answers:

10

I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.

Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.

#include <iostream>

enum Foo
{
    a = 0x0, 
    b = 0x1,
    c = 0x2,
    d = 0x3,
    e = 0x4,
    f = 0x5,
};

int main()
{
    Foo myfoo = a;
    char mychar = 'a';

    std::cout << sizeof(myfoo); // prints 4
    std::cout << sizeof(mychar); // prints 1

    return 1;
}

Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)

Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.

A: 

You can use an unsigned char. Probably typedef it into an BYTE. It will occupy only one byte.

Naveen
+6  A: 

A char is the smallest possible type.

If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:

struct foo {
  unsigned int val1:3;
  unsigned int val2:3;
};

and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.

Alnitak
What does the : do and what is it called? Seems I should do some reading on that. Does it split the int into three equally sized parts that I can access and manipulate individually?
mizipzor
It's a bitfield: http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html.
Bastien Léonard
See http://en.wikipedia.org/wiki/Bit_field. In this case, the :3 says "reserve 3 bits for this value", that being sufficient for 0-7
Alnitak
Okey. But reserving 3 bits shouldnt make the int smaller? If it does get smaller, it shouldnt become smaller than a char. Or is that just a way of making sure "val1" never has a value higher than 7?
mizipzor
there are a number of reasons for not using bitfields - poor performance, lack of portability, and the fact you can't take a bitfield's address being the three main ones
anon
no, an int can never become smaller than its standard size. this is just special syntax, designed for low-level memory structures. see the two other links for more info.
Alnitak
@Neil - sure, but there's no other way to minimize the memory use so efficiently other than packing your data manually, and then you still get the same limitations.
Alnitak
Neil: wrong. bitfields are portable.
No, bitfields are not portable because the layout (whether fields are allocated from most- or least-significant bit first) is implementation-defined.
zvrba
that doesn't mean that the *code* isn't portable, but it *does* matter if you're trying to map your struct to an external chunk of data and vice-versa.
Alnitak
Alnitak, yep. I wanted to make it clear that the code is portable because there was potential for confusion.
if your code depends on that they are allocated into the same integer unit, your code is non-portable. if it doesn't depend on that (and other things implementation defined), your code is portable.
Johannes Schaub - litb
Err, bit fields are no less portable than int test = 1234567; That will only work on like hardware (same int size, same endianness) too.
Imbue
+1  A: 

Minimal size what you can use - 1 byte.

But if you use group of enum values ( writing in file or storing in container, ..), you can pack this group - 3 bits per value.

bb
+1  A: 

The best solution is to create your own type implemented using a char. This should have sizeof(MyType) == 1, though this is not guaranteed.

#include <iostream>
using namespace std;

class MyType {

    public:

     MyType( int a ) : val( a ) {
      if ( val < 0 || val > 6 ) {
       throw( "bad value" );
      }
     }

     int Value() const {
      return val;
     }

    private:

     char val;
};

int main() {

    MyType v( 2 );
    cout << sizeof(v) << endl;
    cout << v.Value() << endl;
}
anon
enum do resize themselves according to the number of possible values. What they don't do is resize themselves into sizes unsupported by the hardware.
please post an example explaining what you are talking about as a separate answer
anon
The underlying type of an enum is implementation-defined (see dcl.enum in the C++ spec). Some implementations choose the underlying type based on the enum values, others don't.
bk1e
Here's an example of an implementation that does this: http://www.keil.com/support/man/docs/armccref/armccref_babjddhe.htm
bk1e
Neil, not gonna post a separate answer because it's not an answer to OP's question. An enum value won't ever be 6 bits long but it might be 8bit, 16bit, etc, depending on whether you have e.g. >256 values in it (and depending on which size the compiler finds optimal).
@Iraimbilanja - you are right - I just took a look at the standard. learn something new every day, I guess!
anon
an enum is always at least as big as an int. and starting from that, it may be long or unsigned long or something else, as far as i remember.
Johannes Schaub - litb
oh no. it's the other way around (just looked it up): it must not be larger than an int if all values can fit into int. but it may be smaller (like char).
Johannes Schaub - litb
@litb - And I always thought it was a fixed (implementation dependant) sized int. Good to find these things out though :-)
anon
+1  A: 

You don't have to enumerate the values of the enum:

enum Foo
{
    a, 
    b,
    c,
    d,
    e,
    f,
};
Foo myfoo = a;

Here Foo is an alias of int, which on your machine takes 4 bytes.

The smallest type is char, which is defined as the smallest addressable data on the target machine. The CHAR_BIT macro yields the number of bits in a char and is defined in limits.h.

[Edit]

Note that generally speaking you shouldn't ask yourself such questions. Always use [unsigned] int if it's sufficient, except when you allocate quite a lot of memory (e.g. int[100*1024] vs char[100*1024], but consider using std::vector instead).

Bastien Léonard
+2  A: 

C++ does not express units of memory smaller than bytes. If you're producing them one at a time, That's the best you can do. Your own example works well. If you need to get just a few, You can use bit-fields as Alnitak suggests. If you're planning on allocating them one at a time, then you're even worse off. Most archetectures allocate page-size units, 16 bytes being common.

Another choice might be to wrap std::bitset to do your bidding. This will waste very little space, if you need many such values, only about 1 bit for every 8.

If you think about your problem as a number, expressed in base-6, and convert that number to base two, possibly using an Unlimited precision integer (for example GMP), you won't waste any bits at all.

This assumes, of course, that you're values have a uniform, random distribution. If they follow a different distribution, You're best bet will be general compression of the first example, with something like gzip.

TokenMacGuy
A page-size of 16 bytes would be unworkably small - typical page size is 2 or 4 Kb. Perhaps you were thinking of paragraphs?
anon
The pages returned by the OS are divided for small allocations by most allocators. Even so, few allocators will produce anything as small as a machine word, much less a byte, because if there are many allocations on a page, it becomes difficult to swap it out, free it, and alloc into it.
TokenMacGuy
+3  A: 

C++ 0x will contain Strongly typed enumerations where you can specify the underlying datatype (in your example char), but current C++ does not support this. The standard is not clear about the use of a char here (the examples are with int, short and long), but they mention the underlying integral type and that would include char as well.

As of today Neil Butterworth's answer to create a class for your problem seems the most elegant, as you can even extend it to contain a nested enumeration if you want symbolical names for the values.

lothar
A: 

The size of an enumeration is defined to be the same of an int. But depending on your compiler, you may have the option of creating a smaller enum. For example, in GCC, you may declare:

enum Foo {
    a, b, c, d, e, f
}
__attribute__((__packed__));

Now, sizeof(Foo) == 1.

Juliano
+1  A: 

You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.

For example:

struct example
{
    unsigned int a : 3; //<Three bits, can be 0 through 7.
            bool b : 1; //<One bit, the stores 0 or 1.
    unsigned int c : 10; //<Ten bits, can be 0 through 1023.
    unsigned int d : 19; //<19 bits, can be 0 through 524287.
}

In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.

Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.

Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:

//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;

To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:

//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);

See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).

Then to decode:

const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;

Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.

Imbue
+1  A: 

It is likely that packing oddly sized values into bitfields will incur a sizable performance penalty due to the architecture not supporting bit-level operations (thus requiring several processor instructions per operation). Before you implement such a type, ask yourself if it is really necessary to use as little space as possible, or if you are committing the cardinal sin of programming that is premature optimization. At most, I would encapsulate the value in a class whose backing store can be changed transparently if you really do need to squeeze every last byte for some reason.

Volte
I intend to use this for a bruteforcing rubiks cube solver. Although I doubt the computer to have enough memory for the bruteforce approach, It will help greatly if each state of the cube can be represented with as little memory as possible, even if it is at the expense of performance.
mizipzor