views:

104

answers:

3

One of the lines in a java file I'm trying to understand is as below.

return new Scanner(file).useDelimiter("\\Z").next();

The file is expected to return upto "The end of the input but for the final terminator, if any" as per java.util.regex.Pattern documentation. But what happens is it returns only the first 1024 characters from the file. Is this a limitation imposed by the regex Pattern matcher? Can this be overcome? Currently I'm going ahead using a filereader. But I would like to know the reason for this behaviour.

+2  A: 

Myself, I couldn't reproduce this. But I think I can shed light as to what is going on.

Internally, the Scanner uses a character buffer of 1024 characters. The Scanner will read from your Readable 1024 characters by default, if possible, and then apply the pattern.

The problem is in your pattern...it will always match the end of the input, but that doesn't mean the end of your input stream/data. When Java applies your pattern to the buffered data, it tries to find the first occurrence of the end of input. Since 1024 characters are in the buffer, the matching engine calls position 1024 the first match of the delimiter and everything before it is returned as the first token.

I don't think the end-of-input anchor is valid for use in the Scanner for that reason. It could be reading from an infinite stream, after all.

Mark Peters
Hi Mark, I think that is a correct reason for scanner not to work. I'm voting up the answer. The way to get it working is the one marked correct. Thank you for your answer.
Sharmila
+1  A: 

Try wrapping the file object in a FileInputStream

Amir Afghani
A: 

Scanner is intended to read multiple primitives from a file. It really isn't intended to read an entire file.

If you don't want to include third party libraries, you're better off looping over a BufferedReader that wraps a FileReader/InputStreamReader for text, or looping over a FileInputStream for binary data.

If you're OK using a third-party library, Apache commons-io has a FileUtils class that contains the static methods readFileToString and readLines for text and readFileToByteArray for binary data..

R. Bemrose