views:

788

answers:

4

I have a BitSet and want to write it to a file- I came across a solution to use a ObjectOutputStream using the writeObject method.

I looked at the ObjectOutputStream in the java API and saw that you can write other things (byte, int, short etc)

I tried to check out the class so I tried to write a byte to a file using the following code but the result gives me a file with 7 bytes instead of 1 byte

my question is what are the first 6 bytes in the file? why are they there?

my question is relevant to a BitSet because i don't want to start writing lots of data to a file and realize I have random bytes inserted in the file without knowing what they are.

here is the code:

    byte[] bt = new byte[]{'A'};
    File outFile = new File("testOut.txt");
    FileOutputStream fos = new FileOutputStream(outFile);
    ObjectOutputStream oos = new ObjectOutputStream(fos);
    oos.write(bt);
    oos.close();

thanks for any help

Avner

+1  A: 

You could be writing any objects out to an ObjectOutputStream, so the stream holds information about the types written as well as the data needed to reconstitute the object.

If you know that the stream will always contain a BitSet, don't use an ObjectOutputStream - and if space is a premium, then convert the BitSet to a set of bytes where each bit corresponds to a bit in the BitSet, then write that directly to the underlying stream (e.g. a FileOutputStream as in your example).

Vinay Sajip
Unfortunately, BitSet has no built-in method to convert it to an array of bytes.
finnw
+1  A: 

The other bytes will be type information.

Basically ObjectOutputStream is a class used to write Serializable objects to some destination (usually a file). It makes more sense if you think about InputObjectStream. It has a readObject() method on it. How does Java know what Object to instantiate? Easy: there is type information in there.

cletus
so if I understand you correctly - every time i write something using ObjectOutputStream I get serious overhead for every write.for instance if I write an int, a short, a byte and then a stringI get 4 sets of extra data for each item i write?
Avner
No. Only the writeObject() method adds the type header. The writeUTF() method adds a 2 byte lenght prefix. The primitive writeXX() methods do not add any overhead. Read the API doc for details.
Michael Borgwardt
Also note that the type information is per-Object. For an object that consists basically of a primitive array (such as BitSet), the overhead is constant, no matter how large the array.
Michael Borgwardt
A: 

The serialisation format, like many others, includes a header with magic number and version information. When you use DataOutput/OutputStream methods on ObjectOutputStream are placed in the middle of the serialised data (with no type information). This is typically only done in writeObject implementations after a call to defaultWriteObject or use of putFields.

Tom Hawtin - tackline
A: 

If you only use the saved BitSet in Java, the serialization works fine. However, it's kind of annoying if you want share the bitset across multi platforms. Besides the overhead of Java serialization, the BitSet is stored in units of 8-bytes. This can generate too much overhead if your bitset is small.

We wrote this small class so we can exract byte arrays from BitSet. Depending on your usecase, it might work better than Java serialization for you.

public class ExportableBitSet extends BitSet {

    private static final long serialVersionUID = 1L;

    public ExportableBitSet() {
     super();
    }

    public ExportableBitSet(int nbits) {
     super(nbits);
    }

    public ExportableBitSet(byte[] bytes) {
     this(bytes == null? 0 : bytes.length*8);  
     for (int i = 0; i < size(); i++) {
      if (isBitOn(i, bytes))
       set(i);
     }
    }

    public byte[] toByteArray()  {

     if (size() == 0)
      return new byte[0];

     // Find highest bit
     int hiBit = -1;
     for (int i = 0; i < size(); i++)  {
      if (get(i))
       hiBit = i;
     }

     int n = (hiBit + 8) / 8;
     byte[] bytes = new byte[n];
     if (n == 0)
      return bytes;

     Arrays.fill(bytes, (byte)0);
     for (int i=0; i<n*8; i++) {
      if (get(i)) 
       setBit(i, bytes);
     }

     return bytes;
    }

    protected static int BIT_MASK[] = 
        {0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01};

    protected static boolean isBitOn(int bit, byte[] bytes) {
     int size = bytes == null ? 0 : bytes.length*8;

     if (bit >= size) 
      return false;

     return (bytes[bit/8] & BIT_MASK[bit%8]) != 0;
    }

    protected static void setBit(int bit, byte[] bytes) {
     int size = bytes == null ? 0 : bytes.length*8;

     if (bit >= size) 
      throw new ArrayIndexOutOfBoundsException("Byte array too small");

     bytes[bit/8] |= BIT_MASK[bit%8];
    }
}
ZZ Coder