views:

165

answers:

4

For a project, I have to convert a binary string into (an array of) bytes and write it out to a file in binary.

Say that I have a sentence converted into a code string using a huffman encoding. For example, if the sentence was: "hello" h = 00 e = 01, l = 10, o = 11

Then the string representation would be 0001101011.

How would I convert that into a byte? <-- If that question doesn't make sense it's because I know little about bits/byte bitwise shifting and all that has to do with manipulating 1's and 0's.

A: 

Why do you need to convert into a "binary string" first? Just go straight to writing bytes as output.

Conceptually what you do is write bits into a byte, until you have filled up a byte. This is done with bit shifting. To add a 1 bit at the bottom of a value you do something like:

b = (b << 1) | 1;

and then once you have filled up a byte you need to grow your output byte[] to make room for another until done. You could use ByteArrayOutputStream for that too, to steadily output byte and then get a byte[] later.

I can point you at a class that lets you append bits and then get the resulting bytes later, thought it's creating an array of ints instead of bytes. You could use it as an example.

Sean Owen
+1  A: 

Here is a simple, but probably inefficient implementation:

import java.io.FilterOutputStream;
import java.io.IOException;
import java.io.OutputStream;

public class BitOutputStream extends FilterOutputStream {

  private int bits = 0;
  private int n = 0;
  private long totalBits = 0;

  public BitOutputStream(OutputStream out) {
    super(out);
  }

  private void writeSingleBit(int bit) throws IOException {
    bits = (bits << 1) | (bit & 1);
    n++;
    totalBits++;
    if (n == 8) {
      super.write(bits);
      bits = 0;
      n = 0;
    }
  }

  /**
   * Writes the <i>numberOfBits</i> lower bits of <i>bitsToWrite</i> to the
   * output stream, starting with the most significant bit.
   */
  public void writeBits(int bitsToWrite, int numberOfBits) throws IOException {
    for (int i = numberOfBits - 1; i >= 0; i--) {
      int bit = bitsToWrite >> i;
      writeSingleBit(bit);
    }
  }

  @Override
  public void write(byte[] b, int off, int len) throws IOException {
    for (int i = 0; i < len; i++)
      writeBits(b[off + i], 8);
  }

  @Override
  public final void write(int b) throws IOException {
    writeBits(b, 8);
  }

  @Override
  public final void flush() throws IOException {
    writeBits(0, (8 - n) & 0x07);
  }

  /**
   * Returns the number of bits that have been written to this bitstream.
   */
  public long getTotalBits() {
    return totalBits;
  }
}

And the corresponding unit test:

import static org.junit.Assert.*;

import java.io.ByteArrayOutputStream;
import java.io.IOException;

import org.junit.Test;

public class BitOutputStreamTest {

  @Test
  public void hello() throws IOException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    BitOutputStream bos = new BitOutputStream(baos);
    bos.writeBits(0x00, 2);
    bos.writeBits(0x01, 2);
    bos.writeBits(0x02, 2);
    bos.writeBits(0x02, 2);
    bos.writeBits(0x03, 2);
    assertEquals(10, bos.getTotalBits());
    bos.close();
    assertEquals(16, bos.getTotalBits());
    assertArrayEquals(new byte[] { 0x1A, (byte) 0xC0 }, baos.toByteArray());
  }
}

This code doesn't output the bits in the string representation you want, but when you want to write them to a byte-based stream later, this is the way to go.

Update (2010-09-25): Fixed a bug in the write(byte[], int, int) method. I forgot to add off to the array index.

Roland Illig
A: 

If you really want (or have to) creating a string representation of bits, you could split the string in substrings of length 8 (beware of the last one which is not necessarily of length 8).

Integer has a method to parse string representations, a sequence of '0' and '1's can be parsed by calling with radix = 2.

static int parseInt(String s, int radix) 

Parses the string argument as a signed integer in the radix specified by the second argument.

--

EDIT: According to the comments Byte.parseByte is the way to go.

stacker
The OP asked for a byte, so Byte.parseByte(s, radix) is the correct method.
Todd Owen
@Todd, you're right. Byte.parseBytes has been introduced in JDK1.6
stacker
@stacker: `parseBytes()`?
trashgod
I believe Byte.parseByte(String s, int radix) has been in the API since JDK1.1. (...or at least since 1.3.1 http://download-llnw.oracle.com/javase/1.3/docs/api/java/lang/Byte.html)
S.Jones
If the intent of the homework was for the OP to learn bit manipulation, this solution will not be accepted by the grader.
Darron
A: 

Encoding a String by concatenating String representations bot the bit sequences representing the individual characters, and then turning that into a byte again seems like a very expensive way of doing things.

You might want to look into Preon instead. Preon first of all has BitChannel abstraction that prevents you from having to worry to much about shifting yourself. You can simply write bit sequences to the BitChannel. It will keep track of the 'bit pointer' internally, and translate everything to bytes further downstream.

BitChannel channel = new OutputStreamBitChannel(...);
channel.write(1, 0); // 0 = 'h'
channel.write(2, 1); // 01 = 'e'
channel.write(3, 2); // 10 = 'l'
channel.write(4, 2); // 11 = '0'

However, ideally, you would be able to use Preon's higher-level abstractions (preon-binding) that would prevent you from having to deal with this yourself at all. It would just require an annotation on your String.

@BoundHuffmanCoded String toBeEncoded = "hello";

... and Preon would take care of the rest. Now, remember, this is the ideal case, and Preon does not have this annotation yet. But it is possible to register a Codec for this yourself. Keep an eye on it though, since this is something that will definitely go into a future version of Preon.

Wilfred Springer