views:

646

answers:

6

I'm a beginner (self-learning) programmer learning C++, and recently I decided to implement a binary-coded decimal (BCD) class as an exercise, and so I could handle very large numbers on Project Euler. I'd like to do it as basically as possible, starting properly from scratch.

I started off using an array of ints, where every digit of the input number was saved as a separate int. I know that each BCD digit can be encoded with only 4 bits, so I thought using a whole int for this was a bit overkill. I'm now using an array of bitset<4>'s.

  1. Is using a library class like this overkill as well?
  2. Would you consider it cheating?
  3. Is there a better way to do this?

EDIT: The primary reason for this is as an exercise - I wouldn't want to use a library like GMP because the whole point is making the class myself. Is there a way of making sure that I only use 4 bits for each decimal digit?

+4  A: 

Just one note, using an array of bitset<4>'s is going to require the same amount of space as an array of long's. bitset is usually implemented by having an array of word sized integers be the backing store for the bits, so that bitwise operations can use bitwise word operations, not byte ones, so more gets done at a time.

Also, I question your motivation. BCD is usually used as a packed representation of a string of digits when sending them between systems. There isn't really anything to do with arithmetic usually. What you really want is an arbitrary sized integer arithmetic library like GMP.

Greg Rogers
That's useful background on bitset - thanks.
Skilldrick
+1  A: 

Is using a library class like this overkill as well?

I would benchmark it against an array of ints to see which one performs better. If an array of bitset<4> is faster, then no it's not overkill. Every little bit helps on some of the PE problems

Would you consider it cheating?

No, not at all.

Is there a better way to do this?

Like Greg Rogers suggested, an arbitrary precision library is probably a better choice, unless you just want to learn from rolling your own. There's something to learn from both methods (using a library vs. writing a library). I'm lazy, so I usually use Python.

Bill the Lizard
This is useful too. Like you say though, I want to learn from rolling out my own, and I saw BCD as a simple way to represent arbitrary precision numbers.
Skilldrick
I do that a *lot* with PE - despite what I said about being lazy. :)
Bill the Lizard
+1  A: 

Like Greg Rogers said, using a bitset probably won't save any space over ints, and doesn't really provide any other benefits. I would probably use a vector instead. It's twice as big as it needs to be, but you get simpler and faster indexing for each digit.

If you want to use packed BCD, you could write a custom indexing function and store two digits in each byte.

Matthew Crumley
+1  A: 
  1. Is using a library class like this overkill as well?
  2. Would you consider it cheating?
  3. Is there a better way to do this?

1&2: not really

3: each byte's got 8-bits, you could store 2 BCD in each unsigned char.

Calyth
A: 

In general, bit operations are applied in the context of an integer, so from the performance aspect there is no real reason to go to bits.

If you want to go to bit approach to gain experience, then this may be of help

#include <stdio.h>
int main
(
    void
)
{
    typedef struct
    {
        unsigned int value:4;

    } Nibble;

    Nibble nibble;

    for (nibble.value = 0; nibble.value < 20; nibble.value++)
    {
        printf("nibble.value is %d\n", nibble.value);
    }

    return 0;
}

The gist of the matter is that inside that struct, you are creating a short integer, one that is 4 bits wide. Under the hood, it is still really an integer, but for your intended use, it looks and acts like a 4 bit integer.

This is shown clearly by the for loop, which is actually an infinite loop. When the nibble value hits, 16, the value is really zero, as there are only 4 bits to work with. As a result nibble.value < 20 never becomes true.

If you look in the K&R White book, one of the notes there is the fact that bit operations like this are not portable, so if you want to port your program to another platform, it may or may not work.

Have fun.

EvilTeach
A: 

You are trying to get base-10 representation (i.e. decimal digit in each cell of the array). This way either space (one int per digit), or time (4-bits per dgit, but there is overhead of packing/unpacking) is wasted.

Why not try with base-256, for example, and use an array of bytes? Or even base-2^32 with array of ints? The operations are implemented the same way as in base-10. The only thing that will be different is converting the number to a human-readable string.

It may work like this: Assuming base-256, each "digit" has 256 possible values, so the numbers 0-255 are all single digit values. Than 256 is written as 1:0 (I'll use colon to separate the "digits", we cannot use letters like in base-16), analoge in base-10 is how after 9, there is 10. Likewise 1030 (base-10) = 4 * 256 + 6 = 4:6 (base-256). Also 1020 (base-10) = 3 * 256 + 252 = 3:252 (base-256) is two-digit number in base-256.

Now let's assume we put the digits in array of bytes with the least significant digit first:

unsigned short digits1[] = { 212, 121 }; // 121 * 256 + 212 = 31188
int len1 = 2;
unsigned short digits2[] = { 202, 20  }; // 20 * 256 + 202 = 5322
int len2 = 2;

Then adding will go like this (warning: notepad code ahead, may be broken):

unsigned short resultdigits[enough length] = { 0 };
int len = len1 > len2 ? len1 : len2; // max of the lengths
int carry = 0;
int i;
for (i = 0; i < len; i++) {
    int leftdigit = i < len1 ? digits1[i] : 0;
    int rightdigit = i < len2 ? digits2[i] : 0;
    int sum = leftdigit + rightdigit + carry;
    if (sum > 255) {
        carry = 1;
        sum -= 256;
    } else {
        carry = 0;
    }
    resultdigits[i] = sum;
}
if (carry > 0) {
    resultdigits[i] = carry;
}

On the first iteration it should go like this:

  1. sum = 212 + 202 + 0 = 414
  2. 414 > 256, so carry = 1 and sum = 414 - 256 = 158
  3. resultdigits[0] = 158

On the second iteration:

  1. sum = 121 + 20 + 1 = 142
  2. 142 < 256, so carry = 0
  3. resultdigits[1] = 142

So at the end resultdigits[] = { 158, 142 }, that is 142:158 (base-256) = 142 * 256 + 158 = 36510 (base-10), which is exactly 31188 + 5322

Note that converting this number to/from a human-readable form is by no means a trivial task - it requires multiplication and division by 10 or 256 and I cannot present code as a sample without proper research. The advantage is that the operations 'add', 'subtract' and 'multiply' can be made really efficient and the heavy conversion to/from base-10 is done only once in the beginning and once after the end of the calculation.

Having said all that, personally, I'd use base 10 in array of bytes and not care about the memory loss. This will require adjusting the constants 255 and 256 above to 9 and 10 respectively.

Kcats
Hi, thanks for this. I started trying this, but I got confused when I was trying to work out the conversion between decimal and BCD. Before, to convert, say, 457 to BCD, I would mod 10 to get the 7, then divide by 10 to get 45, then mod 10 to get 5 (etc.). How would I do this base 255 (or 256)?
Skilldrick
I have edited the answer, because the comment space is not enough for all the explanation.
Kcats