tags:

views:

2052

answers:

7

I have a file that contains some amount of plain text at the start followed by binary content at the end. The size of the binary content is determined by some one of the plain text lines I read. I was using a BufferedReader to read the individual lines, however it exposes no methods to refer to read a byte array. The readUTF for a DataInputStream doesnt read all the way to the end of the line, and the readLine method is deprecated.

Using the underlying FileInputStream to read returns empty byte arrays. Any suggestions on how to go about this?


The code is pretty ordinary

private DOTDataInfo parseFile(InputStream stream) throws IOException{
    DOTDataInfo info = new DOTDataInfo();
    BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
    int binSize = 0;
    String line;
    while((line = reader.readLine()) != null){
     if(line.length() == 0)
      break;
     DOTProperty prop = parseProperty(line);
     info.getProperties().add(prop);
     if(prop.getName().equals("ContentSize"))
      binSize = Integer.parseInt(prop.getValue());
    }
    byte[] content = new byte[binSize];
    stream.read(content); //Its all empty now. If I use a DataInputStream instead, its got the values from the file
    return info;
}
A: 

The correct way is to use an InputStream of some form, probably a FileInputStream unless this becomes a performance barrier.

What do you mean "Using the underlying FileInputStream to read returns empty byte arrays."? This seems very unlikely and is probably where your mistake is. Can you show us the example code you've tried?

Nick Fortescue
+2  A: 

You need to use an InputStream. Readers are for character data. Look into wrapping your input stream with a DataInputStream, like:

stream=new DataInputStream(new BufferedInputStream(new FileInputStream(...)));

The data input stream will give you many useful methods to read various types of data, and of course, the base InputStream methods for reading bytes.

(This is actually exactly what a HTTP server must do to read a request with content.)


The readUTF doesn't read a line, it reads a string that was written in (modified) UTF8 format - refer to the JavaDoc.

Software Monkey
A: 

You can read the text with BufferedReader. When you know where the binary starts you can close the file and open it with RandomAccessFile and read binary from any point in the file. Or you can read the file as binary and convert to text the sections you identify as text. {Using new String(bytes, encoding)}

Peter Lawrey
+2  A: 

You could use RandomAccessFile. Use readLine() to read the plain text at the start (note the limitations of this, as described in the API), and then readByte() or readFully() to read the subsequent binary data.

Using the underlying FileInputStream to read returns empty byte arrays.

That's because you have wrapped the stream in a BufferedReader, which has probably consumed all the bytes from the stream when filling up its buffer.

Zach Scrivena
+1 for mentioning the buffer
Aaron Digulla
A: 

I recommend using DataInputStream. You have the following options:

  • Read both text and binary content with DataInputStream
  • Open a BufferedReader, read text and close the stream. Then open a DataInputStream, skip bytes equal to the size of the text and read binary data.
kgiannakakis
+2  A: 

If you genuinely have a file (rather than something harder to seek in, e.g. a network stream) then I suggest something like this:

  • Open the file as a FileInputStream
  • Wrap it in InputStreamReader and a BufferedReader
  • Read the text, so you can find out how much content there is
  • Close the BufferedReader (which will close the InputStreamReader which will close the FileInputStream)
  • Reopen the file
  • Skip to (total file length - binary content length)
  • Read the rest of the data as normal

You could just call mark() at the start of the FileInputStream and then reset() and skip() to get to the right place if you want to avoid reopening the file. (I was looking for an InputStream.seek() but I can't see one - I can't remember wanting it before in Java, but does it really not have one? Ick.)

Jon Skeet
Well, does seem like the way to go I suppose. But out of curiosity, if it were a hard-to-seek stream, how would one go about it? This does seem like a lot of effort to what I thought might have a intuitive (but elusive!) solution.
Well, the tricky bit is reading the character data without "over-reading" into the binary data. You might be able to do that by reading a single character at a time from an InputStreamReader - I haven't checked whether it buffers - but that will be inefficient. (You could wrap the stream in [cont]
Jon Skeet
a BufferedInputStream to avoid actually going to the OS/disk for each call, admittedly.) Basically mixed-format files are always a bit of a pain. It's not so bad if the format includes a length prefix every time there's text: "The next n bytes are text" as then you know where to stop.
Jon Skeet
A: 

Alas, DataInputStream is deprecated and does not handle UTF. But this should help (it reads a line from a binary stream, without any lookahead).

public static String lineFrom(InputStream in) throws IOException {
 byte[] buf = new byte[128];
 int pos = 0;
 for (;;) {
  int ch = in.read();
  if (ch == '\n' || ch < 0) break;
  buf[pos++] = (byte) ch;
  if (pos == buf.length) buf = Arrays.copyOf(buf, pos + 128);
 }
 return new String(Arrays.copyOf(buf, pos), "UTF-8");
}
Adrian