ansaurus

Question

Answer 1

+5 A:

No... you can't do that as a single value-semantic type... any class data must be a multiple of the 8-bit character size (inviting all the usual quips about CHAR_BITS etc).

That said, let's clutch at straws...

Unfortunately, you're obviously handling very many data items. If this is more than 64k, any proxy object into a custom container of packed values will probably need a >16 bit index/handle too, but still one of the few possibilities I can see worth further consideration. It might be suitable if you're only actively working with and needing value semantic behaviour for a small subset of the values at one point in time.

struct Proxy
{
    Int20_Container& container_;  // might not need if a singleton
    Int20_Container::size_type index_;
    ...
};

So, the proxy might be 32, 64 or more bits - the potential benefit is only if you can create them on the fly from indices into the container, have them write directly back into the container, and keep them short-lived with few concurrently. (One simple way - not necessarily the fastest - to implement this model is to use an STL bitset or vector as the Int20_Container, and either store 20 times the logical index in index_, or multiply on the fly.)

It's also vaguely possible that although your values range over a 20-bit space, you've less than say 64k distinct values in actual use. If you have some such insight into your data set, you can create a lookup table where 16-bit array indices map to 20-bit values.

Tony 2010-09-16 09:32:43

"value-semantic type"? Does that exclude bitfields or something?

Potatoswatter 2010-09-16 10:04:58

@Potatoswatter: No, bitfields have value semantics. It means a type behaves as you'd expect for say an int or std::string in terms of observable behaviour when copying, assigning, operator==, going out of scope etc.. As distinct from e.g. pointer-to-value semantics. See http://www.parashift.com/c++-faq-lite/value-vs-ref-semantics.html if interested.

Tony 2010-09-16 10:16:29

Well, since bitfields don't have to be a multiple of 8 bits, maybe you should change the answer.

Potatoswatter 2010-09-16 10:18:16

I think I've lost you :-). The question: "It has to be flexible enough so that I can use as normal data type e.g Vector< int20 >, operator overloading etc..". That's another way of saying it needs value semantics. You could use a 20 bit field, but it won't prevent the object containing it from being padded out to 32 bits.

Tony 2010-09-16 10:23:14

@Tony: see my answer.

Potatoswatter 2010-09-16 10:26:38

Answer 2

+1 A:

You can use the union keyword to create a bit field (see this site for some nice samples). I've used it way back when bit fields were a necessity. Otherwise, you can create a class that holds 3 bytes, but through bitwise operations exposes just the most significant 20.

Traveling Tech Guy 2010-09-16 09:33:42

But 2 bytes is just 16 bit, how to obtain 20 bits from it.

Gunner 2010-09-16 09:35:28

WTF? an 2 bytes value can only hold 16 bit (assuming 8bit bytes).

BatchyX 2010-09-16 09:36:08

Sorry, meant 3 bytes. Corrected.

Traveling Tech Guy 2010-09-16 09:38:22

but then it cost 3bytes storage... I ended up using 24-bit integer

iKid 2010-09-16 09:42:28

So use the union... (hint: it would still cost you 3 bytes, and maybe even 4, since the compiler aligns memory allocation in 4-byte blocks - depending, of course on the compiler and architecture). But why do you care so much? What environment are you developing for, where 4 bits are at a premium?

Traveling Tech Guy 2010-09-16 09:46:25

Answer 3

+4 A:

Use a class. As long as you respect the copy/assign/clone/etc... STL semantics, you won't have any problem.

But it will not optimize the memory space on your computer. Especially if you put in in a flat array, the 20bit will likely be aligned on a 32bit boundary, so the benefit of a 20bit type there is useless.

In that case, you will need to define your own optimized array type, that could be compatible with the STL. But don't expect it to be fast. It won't be.

BatchyX 2010-09-16 09:35:00

Answer 4

A:

As far as I know that isn't possible.

The easiest option would be to define a custom type, that uses an int32_t as the backing storage, and implements appropriate maths as override operators.

For better storage density, you could store 3 int20 in a single int64_t value.

Douglas Leeder 2010-09-16 09:37:40

Answer 5

+8 A:

If storage is your main concern, I suspect you need quite a few 20-bit variables. How about storing them in pairs? You could create a class representing two such variables and store them in 2.5+2.5 = 5 bytes.

To access the variables conveniently you could override the []-operator so you could write:

int fst = pair[0];
int snd = pair[1];

Since you may want to allow for manipulations such as

pair[1] += 5;

you would not want to return a copy of the backing bytes, but a reference. However, you can't return a direct reference to the backing bytes (since it would mess up it's neighboring value), so you'd actually need to return a proxy for the backing bytes (which in turn has a reference to the backing bytes) and let the proxy overload the relevant operators.

As a metter of fact, as @Tony suggest, you could generalize this to have a general container holding N such 20-bit variables.

(I've done this myself in a specialization of a vector for efficient storage of booleans (as single bits).)

aioobe 2010-09-16 09:38:19

I thought of this solution before. But still not figure out how to represent it so that it's easy to perform operation like a+b, or Vector<int20> ..

iKid 2010-09-16 09:41:03

Updated my answer :-)

aioobe 2010-09-16 10:01:54

Once you start thinking about proxies, seems to me you're better off having a custom container of an arbitrary number of 20-bit values and a proxy into that without the intermediate `pair` representation complicating things.

Tony 2010-09-16 10:26:35

This is basically about manually coding bitfields…

Potatoswatter 2010-09-16 10:28:11

Vector of booleans is already specialized (by STL developers) to store values in bits :)

n0rd 2010-09-16 10:40:37

@n0rd: and most people agree it was a mistake :-)

Matthieu M. 2010-09-16 10:50:27

@n0rd, this was for a C++ lab course :-)

aioobe 2010-09-16 10:51:12

(much of the discussion of Potatoswatter's solution is relevant to this one too)

Tony 2010-09-16 11:13:27

@Matthieu: it was a mistake that it's a specialisation of `vector`, because it doesn't really satisfy the requirements of `vector` or even of `sequence`. There's nothing wrong with the principle of a variable-length `bitset` with an iterator.

Steve Jessop 2010-09-16 11:38:33

@Steve: I don't think I ever said anything against that :), a `bit_vector` would have been very useful.

Matthieu M. 2010-09-16 11:42:16

Answer 6

A:

Just an idea: use optimized storage (5 bytes for two instances), and for operations, convert it into 32-bit int and then back.

alxx 2010-09-16 09:39:56

Answer 7

+1 A:

Your not going to be able to get exactly 20 bits as a type(even with a bit packed struct), as it will always be aligned (at smallest grainularity) to a byte. Imo the only way to go, if you must have 20 bits, is to create a bitstream to handle the data(which you can overload to accept indexing etc)

Necrolis 2010-09-16 09:41:02

Answer 8

+3 A:

You can use C++ std::bitset. Store everything in a bitset and access your data using the correct index (x20).

Ugo 2010-09-16 09:42:36

Answer 9

A:

While its possible to do this a number of ways. One possibilty would be to use bit twidling to store them as the left and right parts of a 5 byte array with a class to store/retrieve which converts yoiur desired array entry to an array entry in byte5[] array and extracts the left ot right half as appropriate.

However on most hardware requires integers to be word aligned so as well as the bit twiddling to extract the integer you would need some bit shifiting to align it properly.

I think it would be more efficient to increase your swap space and let virtual memory take care of your large array (after all 20 vs 32 is not much of a saving!) always assuming you have a 64 bit OS.

James Anderson 2010-09-16 09:54:05

20 vs 32 is a 37.5% savings. If his program is memory-bandwidth constrained chugging through this array, he will pretty much get that much speed boost. I would call 37% faster significant.

Potatoswatter 2010-09-16 09:58:13

Answer 10

+2 A:

Use a bitfield. (I'm really surprised nobody has suggested this.)

struct int20_and_something_else {
    int less_than_a_million : 20;
    int less_than_four_thousand : 12; // total 32 bits
};

This only works as a mutual optimization of elements in a structure, where you can spackle the gaps with some other data. But it works very well!

If you truly need to optimize a gigantic array of 20-bit numbers and nothing else, there is:

struct int20_x3 {
    int one : 20;
    int two : 20;
    int three : 20; // 60 bits is almost 64

    void set( int index, int value );
    int get( int index );
};

You can add getter/setter functions to make it prettier if you like, but you can't take the address of a bitfield, and they can't participate in an array. (Of course, you can have an array of the struct.)

Use as:

int20_x3 *big_array = new int20_x3[ array_size / 3 + 1 ];

big_array[ index / 3 ].set( index % 3, value );

Potatoswatter 2010-09-16 09:56:51

The poster's basically wanting `int20_t data[HUGE_NUM]`. You're offering `Twenty_Twelve data[HUGE_NUM]`. That won't save memory, or allow `X n = data[20] + data[50]; my_vector_of_X.push_back(n)`. What will all the "twelves" be used for? One of us is confused, as I can't see the relevance of this.

Tony 2010-09-16 10:36:04

@Tony: I offer two solutions. One stores twenty bits in an easily-addressed structure (provided there are other scraps of data, which is often the case), and the other stores 3x20 bits in a structure that demands you divide the array index by 3. Really, you can't see how this solves the problem?

Potatoswatter 2010-09-16 10:44:41

Not, I can't. It's basically like aioobe's solution. The client code has to keep providing that index % 3 value all over the place which prevents you thinking in terms of a discrete value-semantic variable. .set isn't "=". Having to put x.get(2) + x.get(1) differs from x + y. How does your suggestion really improve on directly and repeatedly indexing into a single "Container big_array"?

Tony 2010-09-16 10:57:02

@Tony: Except I've provided a concrete solution which can be used immediately. Additional abstraction can get rid of all that syntactic ugliness, but C provides the framework and a bitfield is already a limited proxy object for a range of bits. In practical use, most people choose to deal with code as in my answer, as fully capable proxies are more trouble than extra keystrokes in a couple inner loops. Anyway, to say it's impossible is just incorrect.

Potatoswatter 2010-09-16 10:58:58

@Potatoswatter: good point - you've got a simple working and workable solution there in a couple lines, whereas a value-semantic proxy is probably more like 50 lines and in this case will necessarily make functional trade-offs, and a custom container's maybe 20 lines if backed by an STL bit container. No good answer here, and I've no problem with yours except that the whole question was about striving for those value semantics, examining how close we could come and at what cost.

Tony 2010-09-16 11:11:59

@Tony: I do like to use this site for programming exercise, but this is one case where I'd say that it's just too much trouble. Premature abstraction is the flip side of premature optimization. Also, this being applicable to inner loops, you would have to code the proxy to a high performance standard. Pretty code and micro-optimized code (use of bitfields is micro-optimization) are mutually exclusive.

Potatoswatter 2010-09-16 11:19:31

If i'm not wrong, this struct still take 64bit instead of 60bit. But that come really close to what I need. Many thanks to those devote your time to this question!

iKid 2010-09-16 14:49:16

The reason I want it to be more generic like a + b because like that I would not need to refactor the code which using primitive data like integer 32-bit, 64-bit and 24-bit that I have implemented using template

iKid 2010-09-16 14:50:58

@iKid: that's a solid reason. C++ does generally provide the facilities for drop-in replacement with arbitrary user-defined-types, just unfortunate about the data size issue....

Tony 2010-09-17 01:10:49

ansaurus

tags:

views:

answers:

C++ 2.5 bytes (20-bit) integer

related questions