views:

225

answers:

6

I am reading a text file which contains numbers in the range [1, 10^100]. I am then performing a sequence of arithmetic operations on each number. I would like to use a BigInteger only if the number is out of the int/long range. One approach would be to count how many digits there are in the string and switch to BigInteger if there are too many. Otherwise I'd just use primitive arithmetic as it is faster. Is there a better way?

Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small? This way we would not have to worry about overflows.

+6  A: 

I suspect the decision to use primitive values for integers and reals (done for performance reasons) made that option not possible. Note that Python and Ruby both do what you ask.

In this case it may be more work to handle the smaller special case than it is worth (you need some custom class to handle the two cases), and you should just use BigInteger.

Kathy Van Stone
(Or use Python, Ruby, Jython, or JRuby)
Kathy Van Stone
(-1) It's more related to static typing vs. dynamic typing, rather than primitive vs. non-primitive.
ewernli
@ewemli With static typing you still could do the switch within a wrapper class (using hidden subtypes), but I agree that dynamic typing makes it easier. Having primitives denies the ability to use a wrapper class.
Kathy Van Stone
@Katy I've edited my answer to explain what I mean.
ewernli
+2  A: 

Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small?

Because that is a higher level programming behavior than what Java currently is. The language is not even aware of the BigInteger class and what it does (i.e. it's not in JLS). It's only aware of Integer (among other things) for boxing and unboxing purposes.

Speaking of boxing/unboxing, an int is a primitive type; BigInteger is a reference type. You can't have a variable that can hold values of both types.

polygenelubricants
+1  A: 

You could read the values into BigIntegers, and then convert them to longs if they're small enough.

private final BigInteger LONG_MAX = BigInteger.valueOf(Long.MAX_VALUE);
private static List<BigInteger> readAndProcess(BufferedReader rd) throws IOException {
    List<BigInteger> result = new ArrayList<BigInteger>();
    for (String line; (line = rd.readLine()) != null; ) {
        BigInteger bignum = new BigInteger(line);
        if (bignum.compareTo(LONG_MAX) > 0) // doesn't fit in a long
            result.add(bignumCalculation(bignum));
        else result.add(BigInteger.valueOf(primitiveCalculation(bignum.longValue())));
    }
    return result;
}
private BigInteger bignumCalculation(BigInteger value) { 
    // perform the calculation 
}
private long primitiveCalculation(long value) {
    // perform the calculation
}

(You could make the return value a List<Number> and have it a mixed collection of BigInteger and Long objects, but that wouldn't look very nice and wouldn't improve performance by a lot.)

The performance may be better if a large amount of the numbers in the file are small enough to fit in a long (depending on the complexity of calculation). There's still risk for overflow depending on what you do in primitiveCalculation, and you've now repeated the code, (at least) doubling the bug potential, so you'll have to decide if the performance gain really is worth it.

If your code is anything like my example, though, you'd probably have more to gain by parallelizing the code so the calculations and the I/O aren't performed on the same thread - you'd have to do some pretty heavy calculations for an architecture like that to be CPU-bound.

gustafc
Nice comment. @fahdshariff, you should actually compareTo(MAX_VALUE_THAT_WILL_NOT_OVERFLOW)
tucuxi
A: 

Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small?

This is one of the advantage of dynamic typing, but Java is statically typed and prevents this.

In a dynamically type language when two Integer which are summed together would produce an overflow, the system is free to return, say, a Long. Because dynamically typed language rely on duck typing, it's fine. The same can not happen in a statically typed language; it would break the type system.

EDIT

Given that my answer and comment was not clear, here I try to provide more details why I think that static typing is the main issue:

1) the very fact that we speak of primitive type is a static typing issue; we wouldn't care in a dynamically type language.

2) with primitive types, the result of the overflow can not be converted to another type than an int because it would not be correct w.r.t static typing

   int i = Integer.MAX_VALUE + 1; // -2147483648

3) with reference types, it's the same except that we have autoboxing. Still, the addition could not return, say, a BigInteger because it would not match the static type sytem (A BigInteger can not be casted to Integer).

  Integer j = new Integer( Integer.MAX_VALUE ) + 1; // -2147483648

4) what could be done is to subclass, say, Number and implement at type UnboundedNumeric that optimizes the representation internally (representation independence).

 UnboundedNum k = new UnboundedNum( Integer.MAX_VALUE ).add( 1 ); // 2147483648

Still, it's not really the answer to the original question.

5) with dynamic typing, something like

var d = new Integer( Integer.MAX_VALUE ) + 1; // 2147483648

would return a Long which is ok.

ewernli
That's not really relevant to dynamic/static typing. BigInteger/int is a question of representation. It would have been possible for Java to handle this -- having two representations for the same type -- though it would be somewhat tricky.
Daniel
Daniel is correct, typing is not the issue.
Jakob
@Daniel The very sentence "having two representations for the same type" implies that we could add a new type that has a dual representation. The point is that this *new* wrapper type is necessary only because it's *statically* typed. What I meant is that with dynamic typing you don't need to add any new type at all, it's *trivially* solved. Which I think make the distinction dynamic/static *very* relevant in this context; what we are discussing here is known to be an advantage of dynamic typing.
ewernli
You can have a statically typed language that uses an abstract Integer class with various representations as subclasses. Since they are all immutable, if the result of, say, addition is an Integer the fact that the representations are different would be hidden from the user unless they ask for the run-time type (same as with dynamic types). I could even do this in Java, but only by duplicating most of the integral classes.
Kathy Van Stone
Haskell is a statically typed language and as far as I can tell it automatically switches representations as needed.
Kathy Van Stone
+1  A: 

The impact of using BigDecimals when something smaller will suffice is surprisingly, err, big: Running the following code

public static class MyLong {
    private long l;
    public MyLong(long l) { this.l = l; }
    public void add(MyLong l2) { l += l2.l; }
}

public static void main(String[] args) throws Exception {
    // generate lots of random numbers
    long ls[] = new long[100000];
    BigDecimal bds[] = new BigDecimal[100000];
    MyLong mls[] = new MyLong[100000];
    Random r = new Random();
    for (int i=0; i<ls.length; i++) {
        long n = r.nextLong();
        ls[i] = n;
        bds[i] = new BigDecimal(n);
        mls[i] = new MyLong(n);
    }
    // time with longs & Bigints
    long t0 = System.currentTimeMillis();
    for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
        ls[i] += ls[i+1];
    }
    long t1 = Math.max(t0 + 1, System.currentTimeMillis());
    for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
        bds[i].add(bds[i+1]);
    }
    long t2 = System.currentTimeMillis();
    for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
        mls[i].add(mls[i+1]);
    }
    long t3 = System.currentTimeMillis();
    // compare times
    t3 -= t2;
    t2 -= t1;
    t1 -= t0;
    DecimalFormat df = new DecimalFormat("0.00");
    System.err.println("long: " + t1 + "ms, bigd: " + t2 + "ms, x"
            + df.format(t2*1.0/t1) + " more, mylong: " + t3 + "ms, x"
            + df.format(t3*1.0/t1) + " more");
}

produces, on my system, this output:

long: 375ms, bigd: 6296ms, x16.79 more, mylong: 516ms, x1.38 more

The MyLong class is there only to look at the effects of boxing, to compare against what you would get with a custom BigOrLong class.

tucuxi
Why use `BigDecimal` if `BigInt` would suffice? A non-integer type is naturally much slower.
Daniel
True, but after substituting `BigInteger` for `BigDecimal` there is not much of a difference:`long: 375ms, bigi: 5843ms, x15.58 more, mylong: 532ms, x1.42 more`
tucuxi
A: 

Would it have been possible? Yes. But there are many problems with it.

Consider, for instance, that Java stores references to BigInteger, which is actually allocated on the heap, but store int literals. The difference can be made clear in C:

int i;
BigInt* bi;

Now, to automatically go from a literal to a reference, one would necessarily have to annotate the literal somehow. For instance, if the highest bit of the int was set, then the other bits could be used as a table lookup of some sort to retrieve the proper reference. That also means you'd get a BigInt** bi whenever it overflowed into that.

Of course, that's the bit usually used for sign, and hardware instructions pretty much depend on it. Worse still, if we do that, then the hardware won't be able to detect overflow and set the flags to indicate it. As a result, each operation would have to be accompanied by some test to see if and overflow has happened or will happen (depending on when it can be detected).

All that would add a lot of overhead to basic integer arithmetic, which would in practice negate any benefits you had to begin with. In other words, it is faster to assume BigInt than it is to try to use int and detect overflow conditions while at the same time juggling with the reference/literal problem.

So, to get any real advantage, one would have to use more space to represent ints. So instead of storing 32 bits in the stack, in the objects, or anywhere else we use them, we store 64 bits, for example, and use the additional 32 bits to control whether we want a reference or a literal. That could work, but there's an obvious problem with it -- space usage. :-) We might see more of it with 64 bits hardware, though.

Now, you might ask why not just 40 bits (32 bits + 1 byte) instead of 64? Basically, on modern hardware it is preferable to store stuff in 32 bits increments for performance reasons, so we'll be padding 40 bits to 64 bits anyway.

EDIT

Let's consider how one could go about doing this in C#. Now, I have no programming experience with C#, so I can't write the code to do it, but I expect I can give an overview.

The idea is to create a struct for it. It should look roughly like this:

public struct MixedInt
{
   private int i;
   private System.Numeric.BigInteger bi;

   public MixedInt(string s) 
   {
      bi = BigInteger.Parse(s);
      if (parsed <= int.MaxValue && parsed => int.MinValue)
      {
          i = (int32) parsed;
          bi = 0;
      }   
   }

   // Define all required operations
}

So, if the number is in the integer range we use int, otherwise we use BigInteger. The operations have to ensure transition from one to another as required/possible. From the client point of view, this is transparent. It's just one type MixedInt, and the class takes care of using whatever fits better.

Note, however, that this kind of optimization may well be part of C#'s BigInteger already, given it's implementation as a struct.

If Java had something like C#'s struct, we could do something like this in Java as well.

Daniel
If I understand well what you mean, an `int i` in the code could hide a `BigInt` because the runtime would have swapped the representation. To me this means that an `int` is not more an `int` w.r.t to the type system, because its bound (type) is not enforced any longer. But it would be possible to add another type, say, `unbounded` that would do what you describe and optmimize the representation at run-time. But that's something completely different.
ewernli
@ewernli No, that's not what I mean at all. The type system has nothing to do with it. Say, for instance, that I wanted to implement such a thing in Scala. I'd just call it Either[Int, BigInt], and define the operations over it. The difference is one between a reference, a pointer, and a literal. A literal is stored as is everywhere -- registers, stack, heap. A reference is stored as a pointer to the place in the heap that contains the data. When you look at some datum X, how do you know whether it is a reference or a literal? That's the problem that is tricky to solve. See edited answer.
Daniel
You're explanation actually shows that it *is* a static typing issue: either you need (1) an extension of the type system with `Either[...]` or whatever (2) to define an artificial wrapper type that optimizes the representation internally, but that is introduced just to solve this static typing issue (like your C# example). With dynamic typing none of that is necessary and the problem is *trivially* solved; what I meant is that the main thing that prevent you from *trivially* doing that *is* the *current* java type system.
ewernli
I understand the literal/reference problem, but it's not the *main* issue, and you actually just showed how it could be done. But you would not be able to reconcile this approach with java current static typing. But thanks for the clarification anyway.
ewernli
@ewernli `Either` is a type, not an extension of any kind. All types, including the ones on dynamic languages, are wrappers. Furthermore, any dynamic language with automatic conversion between 32 bits integer and arbitrarily sized integers does exactly the same thing I explained above. You may not be aware of it, but neither you need to be aware of how C#'s BigInteger is implemented. Finally, the problem with Java is just that it doesn't allow extension of primitive types -- as opposed to C# -- and nothing to do with static typing.
Daniel