views:

532

answers:

5

Hi,

I want to write a BigInteger to a file.
What is the best way to do this.
Of course I want to read (with the program, not by human) it from an inputstream.
Do I have to use an ObjectOutputStream or are there better ways?

The purpose is to use as less bytes as possible.

Thanks
Martijn

+3  A: 

I'd go with ObjectOutputStream, that is what it was designed for (not BigInteger specifically, but classes).

Here is some quick sample code that shows the overhead for both compresssed and uncompressed ObjectOutpuStreams.

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.List;
import java.util.zip.GZIPOutputStream;


public class Main
{
    public static void main(String[] args)
        throws IOException
    {
        run(1);
        run(10);
        run(100);
        run(1000);
        run(10000);
        run(100000);
        run(1000000);
    }

    private static void run(final int size)
        throws IOException
    {
        final List<BigInteger> values;
        final int              uncompressedSize;
        final int              compressedSize;

        values           = createValues(size);
        uncompressedSize = storeUncompressed(values);
        compressedSize   = storeCompressed(values);

        System.out.println(size + " uncompressed is " + uncompressedSize + " ratio is: " + ((float)uncompressedSize / size));
        System.out.println(size + " compressed   is " + compressedSize   + " ratio is: " + ((float)compressedSize   / size));
    }

    private static List<BigInteger> createValues(final int size)
    {
        final List<BigInteger> values;

        values = new ArrayList<BigInteger>(size);

        for(int i = 0; i < size; i++)
        {
            values.add(BigInteger.ZERO);
        }

        return (values);
    }

    private static int storeUncompressed(final List<BigInteger> values)
        throws IOException
    {
        final ByteArrayOutputStream bytes;

        bytes = new ByteArrayOutputStream();
        store(values, bytes);

        return (bytes.size());
    }


    private static int storeCompressed(final List<BigInteger> values)
        throws IOException
    {
        final ByteArrayOutputStream bytes;
        final GZIPOutputStream      zip;

        bytes = new ByteArrayOutputStream();
        zip   = new GZIPOutputStream(bytes);
        store(values, zip);

        return (bytes.size());
    }

    private static void store(final List<BigInteger> values,
                              final OutputStream     sink)
        throws IOException
    {
        ObjectOutputStream stream;

        stream = null;

        try
        {
            stream = new ObjectOutputStream(sink);

            for(final BigInteger value : values)
            {
                stream.writeObject(value);
            }
        }
        finally
        {
            if(stream != null)
            {
                stream.close();
            }
        }
    }
}

The output is:

1 uncompressed is 202 ratio is: 202.0
1 compressed   is 198 ratio is: 198.0
10 uncompressed is 247 ratio is: 24.7
10 compressed   is 205 ratio is: 20.5
100 uncompressed is 697 ratio is: 6.97
100 compressed   is 207 ratio is: 2.07
1000 uncompressed is 5197 ratio is: 5.197
1000 compressed   is 234 ratio is: 0.234
10000 uncompressed is 50197 ratio is: 5.0197
10000 compressed   is 308 ratio is: 0.0308
100000 uncompressed is 500197 ratio is: 5.00197
100000 compressed   is 962 ratio is: 0.00962
1000000 uncompressed is 5000197 ratio is: 5.000197
1000000 compressed   is 7516 ratio is: 0.007516

You would change the " values.add(BigInteger.ZERO);" line to make the test more realistic - I just wanted a baseline for it.

TofuBeer
OOS uses a very bloated format that will include class headers and all sorts of stuff that is not the "least bytes possible".
Alex Miller
+1  A: 

Do you want to read/write the whole Object or only its value? If the former, then make use of Serialization. If the latter, then just make use of ByteArrayInputStream/ByteArrayOutputStream wherein you write the outcome of BigInteger#toByteArray() and construct a new one with help of new BigInteger(byte[]) respectively. The last way obviously generates much less bytes in the file.

BalusC
I'd vote this up if not for the premature optimization.
Bill K
The OP explicitly stated *"It is the meaning I use less bytes as possible."*.
BalusC
I'm not comfortable with introducing character encoding and text parsing where they are unnecessary.
Tom Hawtin - tackline
I know there's some overhead to serializing Objects, but wouldn't the character representation of a BigInteger actually be much less efficient than the binary representation? It would probably be more clever to store the long value or the byte array instead.
rob
toString is not safe to use for anything but debugging. Consider that the format of toString can change unless is is documented in the javadoc.
TofuBeer
Valid points. I'll update the answer a bit.
BalusC
+1  A: 

Yes, you can use ObjectOutputStream/ObjectInputStream for simplicity, or you can convert the BigInteger to a byte[], and serialize that value instead of the entire Object. The latter would save a significant amount of storage space over serializing the entire Object.

Also, if you use stream classes that are not already buffered, remember to wrap your OutputStreams and InputStreams in BufferedOutputStream and BufferedInputStream to improve performance, and flush() after you're done writing (if you don't flush() the BufferedOutputStream, the InputStream may stall or hang waiting for input).

If you're worried about bandwidth or file size, you can also wrap your streams in GZipOutputStream/GZipInputStream for automatic compression. I wouldn't worry about compressing the data unless you actually observe poor performance or huge files, however.

rob
+7  A: 

Java serialisation (ObjectOutputStream/ObjectInputStream) is a general purpose way of, er, serialising objects into octet sequences. However, there are issue with serialisation.

To be uber efficient, BigInteger has toByteArray and a constructor that takes byte[]. Then you need some way to represent byte[] (including length) in a stream. For instance, you could use DataOutputStream to writeInt the length, and follow that with the raw data.

Streams can, of course, be compressed with a suitable decorator of your choice.

Tom Hawtin - tackline
+1  A: 

Edited: I didn't realize the question was a about optimization.

You could compress the serialized object afterwards to save some bytes. Try using the following.

FileOutputStream fos = new 
    FileOutputStream("db");
  GZIPOutputStream gz = new GZIPOutputStream(fos);

Here is an article by sun about it.

Gordon