views:

491

answers:

10

As an optional assignment, I'm thinking about writing my own implementation of the BigInteger class, where I will provide my own methods for addition, subtraction, multiplication, etc.

This will be for arbitrarily long integer numbers, even hundreds of digits long.

While doing the math on these numbers, digit by digit isn't hard, what do you think the best datastructure would be to represent my "BigInteger"?

At first I was considering using an Array but then I was thinking I could still potentially overflow (run out of array slots) after a large add or multiplication. Would this be a good case to use a linked list, since I can tack on digits with O(1) time complexity?

Is there some other data-structure that would be even better suited than a linked list? Should the type that my data-structure holds be the smallest possible integer type I have available to me?

Also, should I be careful about how I store my "carry" variable? Should it, itself, be of my "BigInteger" type?

A: 

I would say an array of ints.

fastcodejava
Could you at least address the issues I mentioned concerning that implementation?
Fifa89
A: 

i would say a std::vector of char (since it has to hold only 0-9) (if you plan to work in BCD)

If not BCD then use vector of int (you didnt make it clear)

Much less space overhead that link list

And all advice says 'use vector unless you have a good reason not too'

pm100
umm... I think binary math would be better than working with the base 10 numbers here...
Inverse
umm - BCD is a common bigint implementation. He didnt say what he planned to do
pm100
A: 

An Array is indeed a natural fit. I think it is acceptable to throw OverflowException, when you run out of place in your memory. The teacher will see attention to detail.

A multiplication roughly doubles digit numbers, addition increases it by at most 1. It is easy to create a sufficiently big Array to store the result of your operation.

Carry is at most a one-digit long number in multiplication (9*9 = 1, carry 8). A single int will do.

Tadeusz A. Kadłubowski
Multiplication results in roughly digits_a + digits_b + [0-2] digits in the product...which is not more than double the number in the larger input, but could be considerably less than that.
dmckee
+1  A: 

Always use the smallest int type that will do the job you need (bytes). A linked list should work well, since you won't have to worry about overflowing.

CaptnCraig
I would write it in base 2**(machine_word-1) (base 65536 for 32 bit machines). Why waste time processing single bytes, when you can process whole words at once.
Tadeusz A. Kadłubowski
Overflowing is not really an issue, he can use a vector or a deque which will be both much more efficient than a linked list for this application.
Manuel
I refuse to believe that 2**31 is 65536
Wallacoloo
A `std::list` uses two link pointers per node. On a 64-bit system, that would be 16 bytes of links and 1 byte of data… which gets padded out to 8. The appropriate type for this job is `int`, which may be 64 or 32 (or even 16) bits — the class should detect which and adjust itself if necessary. Also, a linked list is totally inappropriate.
Potatoswatter
+2  A: 

Check out the book C Interfaces and Implementations by David R. Hanson. It has 2 chapters on the subject, covering the vector structure, word size and many other issues you are likely to encounter.

It's written for C, but most of it is applicable to C++ and/or Java. And if you use C++ it will be a bit simpler because you can use something like std::vector to manage the array allocation for you.

finnw
http://code.google.com/p/cii/source/browse/trunk/src/ap.chttp://code.google.com/p/cii/source/browse/trunk/examples/calc.c
slf
+1  A: 

If you use binary trees (whose leaves are ints), you get all the advantages of the linked list (unbounded number of digits, etc) with simpler divide-and-conquer algorithms. You do not have in this case a single base but many depending the level at which you're working.

If you do this, you need to use a BigInteger for the carry. You may consider it an advantage of the "linked list of ints" approach that the carry can always be represented as an int (and this is true for any base, not just for base 10 as most answers seem to assume that you should use... In any base, the carry is always a single digit)

I might as well say it: it would be a terrible waste to use base 10 when you can use 2^30 or 2^31.

Pascal Cuoq
A: 

Accessing elements of linked lists is slow. I think arrays are the way to go, with lots of bound checking and run time array resizing as needed.


Clarification: Traversing a linked list and traversing an array are both O(n) operations. But traversing a linked list requires deferencing a pointer at each step. Just because two algorithms both have the same complexity it doesn't mean that they both take the same time to run. The overhead of allocating and deallocating n nodes in a linked list will also be much heavier than memory management of a single array of size n, even if the array has to be resized a few times.

mobrule
How are they slow? I'm not doing random access for any of these operations, just sequential. This would make the linked-list just as fast as the vector for sequential access and better speed for insertions.
Fifa89
@Fifa89 - a linked list is faster for insertions in the middle, but for what you want to do you'll be adding elements at the end, and for that a vector or a deque are just as fast.
Manuel
@Manuel, I keep seeing you say this, but can you please provide an answer showing it? As far as I know, a linked-list provides O(1) insertion time, and the vector provides an amortized O(1) insertion. Can you elaborate on what makes the vector faster?
Fifa89
@Fifa89 - I assumed you would only add elements at the end (with push_back). A linked list will allocate on every push_back whereas a vector will allocate every N push_backs. For such a performance-sensitive class as BigNum this will make a huge difference.
Manuel
@Fifa89 - Are you sure traversal of a linked list is as fast as the same operation on a vector? I'm sure that for large BigNums a vector will cause much fewer cache misses.
Manuel
@Fifa89, @Manuel: as long as the number fits in cache, both will be linear time with no cache misses. However, the linked list will be bound by the number of L1 *hits* required to traverse it, as you have to access all the links. Also, the linked list will fill up the cache and cause misses with for a smaller number, because it clogs the cache with pointers.
Potatoswatter
A: 

std::vector<bool> or std::vector<unsigned int> is probably what you want. You will have to push_back() or resize() on them as you need more space for multiplies, etc. Also, remember to push_back the correct sign bits if you're using two-compliment.

Inverse
I wouldn't consider std::vector<bool> an option, to be honest
Manuel
A: 

As a rule of thumb, use std::vector instead of std::list, unless you need to insert elements in the middle of the sequence very often. Vectors tend to be faster, since they are stored contiguously and thus benefit from better spatial locality (a major performance factor on modern platforms).

Make sure you use elements that are natural for the platform. If you want to be platform independent, use long. Remember that unless you have some special compiler intrinsics available, you'll need a type at least twice as large to perform multiplication.

I don't understand why you'd want carry to be a big integer. Carry is a single bit for addition and element-sized for multiplication.

Make sure you read Knuth's Art of Computer Programming, algorithms pertaining to arbitrary precision arithmetic are described there to a great extent.

avakar
May I ask why long? Are there any guarantees that this will map to the architecture's word size?
Manuel
There aren't, it's just an educated guess (it does map to natural word size under msvc and gcc on intel).
avakar
A: 

Wow, there are some… interesting answers here. I'd recommend reading a book rather than try to sort through all this contradictory advice.

That said, C/C++ is also ill-suited to this task. Big-integer is a kind of extended-precision math. Most CPUs provide instructions to handle extended-precision math at comparable or same speed (bits per instruction) as normal math. When you add 2^32+2^32, the answer is 0… but there is also a special carry output from the processor's ALU which a program can read and use.

C++ cannot access that flag, and there's no way in C either. You have to use assembler.

Just to satisfy curiosity, you can use the standard Boolean arithmetic to recover carry bits etc. But you will be much better off downloading an existing library.

Potatoswatter
If he wanted to use an existing library he would of just used the BigInteger class he mentioned in his post. Sometimes people do things to learn and not to put into some production environment. I think this is a good CS assignment and one that every CS student should do at some point.
Simucal
Potatoswatter