tags:

views:

462

answers:

3

I've constructed a regular expression which I compile to a Pattern to find Fortran Real*8 numbers. The tricky bit is that the file I'm reading from is a single line with a few million columns.. When I do this:

Scanner recordScanner = new Scanner(recordString);
String foundReal = recordScanner.findInLine(real8Regex);

I get what I'm looking for, but when I use the next(Pattern) method, I get an InputMismatchException.. Strange, considering both findInLine and next return Strings.

Scanner recordScanner = new Scanner(recordString);
String foundReal = recordScanner.next(real8Regex);

Am I missing something crucial in the use of the next() method?

+1  A: 

It seems to me that the documentation isn't brilliantly written, but it's doing what it's meant to.

next(pattern) is documented to return the token if it is found at the scanner's current location. findInLine(pattern) is documented to return null if the pattern isn't matched within the current line.

To check first, use hasNext(pattern) before calling next(pattern).

Jon Skeet
Thanks for the response,wWhen I've tried with hasNext(Pattern) the app doesn't find anything...If I surround the if(hasNext(Pattern)) conditional with a while loop for Scanner.hasNext() (which simply is true if there is another token based on my delimiter, which is whitespace), the code just eats CPU cycles but never returns results.
sbook
Yes, because hasNext doesn't advance the scanner. It's not clear to me what you really want to achieve. Some sample code would help.
Jon Skeet
A: 

Is it a "not all tokens match the pattern and thus next(Pattern) gets stuck at the first non-matching token" issue?

next(Pattern) could be used like this:

String toSearch = "ab123d4e::g67f912g34h";
Scanner aScanner = new Scanner(toSearch);
aScanner.useDelimiter("[a-z]+");
while (aScanner.hasNext("[0-9]+"))
{
    System.out.println(aScanner.next("[0-9]+"));
}

but will only output 123 and 4 as the non-matching third token causes the while loop to terminate. In that scenario, however, I should just use hasNext() and next() instead.

I'm struggling to think of a real reason to ever use next(Pattern) because it will get stuck at the first token which does not match the pattern. next(Pattern) does not mean "return the first token after the current position which matches Pattern"; it means "return the next token in the sequence if it matches Pattern; otherwise do nothing"

You (presumably) need to read in all of the tokens so would be better to use hasNext() and next() and then use a Matcher against the required Pattern for each token

Finally, you may find question 842496 useful

barrowc
+1  A: 

I'm a little late (you should have tagged it "regex"), but you should be using

String foundReal = recordScanner.findWithinHorizon(real8Regex, 0);

By using findInline(real8Regex) you make the Scanner do a lot of needless processing to ensure that the current match is on the same line as the last one. The fact that your data is all on one line is precisely why you shouldn't use findInLine().

Alan Moore