tags:

views:

58

answers:

3

I have a very large text file and I need to gather data from somewhere near the end. Maybe Scanner isn't the best way to do this but it would be very wasteful to start at the top and grab 6000 lines before getting to the part of the file I am interested in. Is there a way to either tell Scanner to jump to say 7/8ths down the document or start from the bottom and scan upwards grabbing line by line?

Thanks

+2  A: 

Scanner wraps an InputStream, you can use the stream's skip(long) method to skip the lines you don't want and then start scanning.

Read more in the InputStream javadoc

RonK
The obvious difficulty there is that `skip` doesn't skip *lines*, it skips *bytes*, and there's no way to tell how many bytes are in each line without reading them. But it's a good way to skip some data.
Mark Peters
@Mark Peters: Great comment - I did not take that into consideration.
RonK
This is still very usable for me. Thank you for the help.
Mike
+1  A: 

You should probably use RandomAccessFile instead.

Ben S
+3  A: 

The underlying input source for a java.util.Scanner is a java.lang.Readable. Beyond the Scanner(File) constructor, a Scanner neither knows nor cares of the fact that it's scanning a file.

Also, since it's regex based on java.util.regex.*, there's no way it can scan backward.

To accomplish what you want to do, it's best to do it at the input source level, e.g. by using InputStream.skip of the source before passing it to the constructor of Scanner.


On Scanner.skip

Scanner itself does have a skip, and a pattern like "(?s).{10}" would skip 10 characters (in (?s) single-line/Pattern.DOTALL mode), but this is perhaps a rather roundabout way of doing it.

Here's an example of using skip to skip a given number of lines.

    String text =
        "Line1 blah blah\n" +
        "Line2 more blah blah\n" +
        "Line3 let's try something new \r\n" +
        "Line4 meh\n" + 
        "Line5 bleh\n" + 
        "Line6 bloop\n";
    Scanner sc = new Scanner(text).skip("(?:.*\\r?\\n|\\r){4}");
    while (sc.hasNextLine()) {
        System.out.println(sc.nextLine());
    }

This prints (as seen on ideone.com):

Line5 bleh
Line6 bloop
polygenelubricants
Thank you for the answer. I guess I missed the skip function in the Scanner API listing. I think I will take advantage of the InputStream.skip though.
Mike
@Mike: yes, Mark Peters is spot on. If you know how many bytes you want to skip, skip at the `InputStream` level. If you don't, e.g. you want to skip some number of lines, and the input is dynamic enough that you can't preprocess it to create an index out of it, then just `Scanner.skip`. Needless to say, this method skips _by matching_ the input, so it actually does quite an amount of work.
polygenelubricants