views:

100

answers:

1

Greetings,

I need to multiply two extremely long integer values stored in a text file (exported via GMP (MPIR, to be exact), so they can be any in any base). Now, I would usually just import these integers via the mpz_inp_str() function and perform the multiplication in RAM, however, these values are so long that I can't really load them (about 1 GB of data each). What would be the fastest way to do this? Perhaps there are some external libraries that do this sort of thing already? Are there any easily implementable methods for this (performance is not incredibly important as this operation would only be performed once or twice)?

tl;dr: I need to multiply values so large they don't fit into process memory limits (Windows).

Thank you for your time.

+2  A: 

I don't know if there is a library that supports this, but you could use GMP/MPIR on parts of each really big number (RBN). That is, start by breaking each RBN into manageable, uniformly sized chunks (e.g. 10M digit chunks, expect an undersized chunk for most significant digits, also see below).

    RBN1 --> A B C D E
    RBN2 --> F G H I J

The chunking can be done in base 10, so just read <chuck_size> characters from the file for each piece. Then multiply chunks from each number one at a time.

     AxF BxF CxF DxF ExF
    +    AxG BxG CxG DxG ExG
    +        AxH BxH CxH DxH ExH
    +            AxI BxI CxI DxI ExI
    +                AxJ BxJ CxJ DxJ ExJ

Perform each column of the final sum in memory. Then, keeping the carry in memory, write the column out to disk, repeat for next column... For carries, convert each column sum result to a string with GMP, write out the bottom <chunk size> portion of the result and read the top portion back in as a GMP int for the carry.

I'd suggest selecting a chunk size dynamically for each multiplication in order to keep each column addition in memory; the larger the numbers, the more column additions will need to be done, the smaller the chunk size will need to be.

For both reading and writing, I'd suggest using memory mapped files, boost has a nice interface for this (note that this does not load the entire file, it just basically buffers the IO on virtual memory). Open one mapped file for each input RBN numbers, and one output with size = size(RBN1) + size(RBN2) + 1; With memory mapped files, file access is treated as a raw char*, so you can read/write chunks directly using gmp c-string io methods. You will probably need to read into an intermediate buffer in order to NULL terminated strings for GMP (unless you want to temporarily alter the memory mapped file).

This isn't very easy to implement correctly, but then again this isn't a particularly easy problem (maybe just tedious). This approach has the advantage that it exactly mirrors what GMP is doing in memory, so the algorithms are well known.

academicRobot
Thank you, I will look into this.
Kathoz
Thank you, I have successfully implemented this multiplication method. For anyone else interested in a similar problem, this is apparently called Comba multiplication. Although information on it is scarce, there were a couple of good articles on Google.
Kathoz