tags:

views:

257

answers:

4

What datatype choices do we have to handle large numbers in R? By default, the size of an integer seems to be 32bit, so bigint numbers from sql server as well as any large numbers passed from python via rpy2 get mangled.

> 123456789123
[1] 123456789123
> 1234567891234
[1] 1.234568e+12

When reading a bigint value of 123456789123456789 using RODBC, it comes back as 123456789123456784 (see the last digit), and the same number when deserialized via RJSONIO, comes back as -1395630315L (which seems like an additional bug/limitation of RJSONIO).

> fromJSON('[1234567891]')
[1] 1234567891
> fromJSON('[12345678912]')
[1] -539222976

Actually, I do need to be able to handle large numbers coming from JSON, so with RJSONIO's limitation, I may not have a workaround except for finding a better JSON library (which seems like a non-option right now). I would like to hear what experts have to say on this as well as in general.

+3  A: 

See help(integer):

 Note that on almost all implementations of R the range of
 representable integers is restricted to about +/-2*10^9: ‘double’s
 can hold much larger integers exactly.

so I would recommend using numeric (i.e. 'double') -- a double-precision number.

Dirk Eddelbuettel
I looked at the as.numeric() function, but was confused by the fact that mode(1) also gives "numeric" as the type, so I thought I was already dealing with them. I then tried as.numeric("123456789123456789") and saw only a few numbers printed, so assumed that it lost the precision. I didn't know about options("digits") before.
haridsv
Ah, yes, the digits thing. Also, if you need higher-precision or large numbers, CRAN has packages for that as e.g. the (oddly named :-) Brobdingnag package for large numbers, and there is also the gmp package to interface GNU gmp.
Dirk Eddelbuettel
+1  A: 

Dirk is right. You should be using the numeric type (which should be set to double). The other thing to note is that you may not be getting back all the digits. Look at the digits setting:

> options("digits")
$digits
[1] 7

You can extend this:

options(digits=14)

Alternatively, you can reformat the number:

format(big.int, digits=14)

I tested your number and am getting the same behavior (even using the double data type), so that may be a bug:

> as.double("123456789123456789")
[1] 123456789123456784
> class(as.double("123456789123456789"))
[1] "numeric"
> is.double(as.double("123456789123456789"))
[1] TRUE
Shane
Thanks for pointing the options() and format(), they are helpful. However, these options seem to only control how the number is formatted for display, so it shouldn't change how the number is parsed while using as.double() or as.numeric(). The behavior could be a bug.
haridsv
+1  A: 

I understood your question a little differently than the two that answered before me. If R's largest default value is not big enough for you, you have a few choices. (Disclaimer: I have used each of the libraries i mention below, but not through the R bindings, rather either other language bindings or the native library.)

The Brobdingnag package: uses natural logs to store the values; (like Rmpfr, implemented using R's new class structure). Math for real men:

library(Brobdingnag)
googol <- as.brob(1e100)   

The gmp package: R bindings to the venerable GMP (GNU Multi-precision library). This must go back 20 years because i used it in University. This Library's motto is "Arithmetic Without Limits," which is a credible claim--integers, rationals, floats, whatever, right up to the limits of the RAM on your box.

library(gmp)
x = as.bigq(8000, 21)

The Rmpfr package: R bindings which interface to both gmp (above) and MPFR, (MPFR is in turn a contemporary implementation of gmp. I have used the Python bindings ('bigfloat') and can recommend it highly. This might be your best option of the three, given its scope, given that it appears to be the most actively maintained, and and finally given what appears to be the most thorough documentation.

Note: to use either of the last two, you'll need to install the native libraries, GMP and MPFR.

doug
Thanks, but currently I am satisfied with the limitations of numeric datatype, though it didn't really meet my original question. I will keep your suggestion in mind and will look into them in case I need to handle larger values.
haridsv
A: 

I fixed few issues related to integers in rpy2 (Python can swich from int to long when needed, but R does does not seem to be able to do that. Integer overflows should now return NA_integer_.

L.