tags:

views:

241

answers:

1
+1  A: 

The problem with your code is that you are using the wrong class to read raw data from the file. As the BufferedReader documentation says:

public int read() throws IOException

Reads a single character.

Returns: The character read, as an integer in the range 0 to 65535 (0x00-0xffff), or -1 if the end of the stream has been reached

So each call to the read() method of BufferedReader actually consumes one or two bytes (based on character encoding) from the input stream, which is not what you want. This also explains why you get a lot of -1: the stream ended much earlier than you thought.

Since PGM contains values as ASCII decimal, it is easy to parse using the Scanner class.

Here's an almost untested code that shows how to read a PGM image assuming that:

  • it contains a single comment after the magic number (i.e. it does not have lines that start with a # except the second one)
  • the PGM file is exactly 4 lines long.

Here's the code:

String filePath = "image.pgm";
fileInputStream = new FileInputStream(filePath);
Scanner scan = new Scanner(fileInputStream);
// Discard the magic number
scan.nextLine();
// Discard the comment line
scan.nextLine();
// Read pic width, height and max value
int picWidth = scan.nextInt();
int picHeight = scan.nextInt();
int maxvalue = scan.nextInt();

fileInputStream.close();

 // Now parse the file as binary data
 fileInputStream = new FileInputStream(filePath);
 DataInputStream dis = new DataInputStream(fileInputStream);

 // look for 4 lines (i.e.: the header) and discard them
 int numnewlines = 4;
 while (numnewlines > 0) {
     char c;
     do {
         c = (char)(dis.readUnsignedByte());
     } while (c != '\n');
     numnewlines--;
 }

 // read the image data
 int[][] data2D = new int[picHeight][picWidth];
 for (int row = 0; row < picHeight; row++) {
     for (int col = 0; col < picWidth; col++) {
         data2D[row][col] = dis.readUnsignedByte();
         System.out.print(data2D[row][col] + " ");
     }
     System.out.println();
 }

Need to implement: support for comment lines, values for each element should be divided by maxvalue, error checking for malformed files, exception handling. I tested it on a PGM file using UNIX end-of-lines, but it should work on Windows too.

Let me stress that this is not a robust nor complete implementation of a PGM parser. This code is intended just as proof of concept that maybe accomplishes just enough for your needs.

If you really need a robust PGM parser, you may use the tools provided by Netpbm.

Giuseppe Cardone
This works well, but it gives rise to a new problem: parsing out the header file. I used a BufferedReader/StreamTokenizer to read the header characters, and for some reason once that's complete, the first call to dis.readByte() throws a EOFException. If I remove the header from the file and just read straight from the binary, I run into a different problem: the first 55 bytes it reads are junk numbers; the 56th byte is the "1" that shows up first in my original post, followed by all the corresponding numbers (up to 55 bytes short, due to the junk lead-in). Any thoughts?
Magsol
Er sorry, disregard the bit about the 55 bytes; it works just fine if I eliminate the header (and hence, the BufferedReader/StreamTokenizer and have a single file handle - the DataInputStream - reading from the file).
Magsol
My bad, I didn't read the PGM file format specification. I'll try to give it a shot in a few minutes.
Giuseppe Cardone
I modified the code snippet, now it works with your sample PGM file.
Giuseppe Cardone
Hmm, I'm still having problems: I get an InputMismatchException when I try to run your new code. It reads the header lines correctly, but on its first run through both loops in reading the binary image data, it throws the exception. Any thoughts?
Magsol
You get that exception because the Scanner expected a token representing a integer but found "something else". I added a small check that prints on stdout the unexpected token. To understand why you get that exception you should debug the code and see what token the scanner expected (probably a string representing an integer, since you say that it correctly parses the header) and what you get instead.
Giuseppe Cardone
I posted the full output to your code; I don't know if you did this, but the "image data" I posted of all the pixel values is what the reader *should* pick up and output. What's actually stored in the file is binary image data, not ASCII integer values. The values I posted are what Matlab reads when it opens a PGM file, and what I would like to be able to read from within Java. Thanks for all your help so far; hopefully we can nail this!
Magsol
Posted another update; found a deprecated method that seems to work, but unfortunately, it's deprecated :P
Magsol
Modified again the code so that it reads the PGM header using a Scanner, then reads the image data using a DataInputStream. This is not the fastest or cleanest way to parse a PGM file, but it is extremely easy to see what's going on.
Giuseppe Cardone