views:

94

answers:

3

I'd like to parse some text using an hand-written descending parser. I used Scanner with the following delimiter : "\\s*". Unfortunately, the fact that this pattern matches an empty String seems to make every hasNextFoo and nextFoo matching nothing anymore.

The doc doesn't say anything about possibly empty delimitors.

A: 

You have some objection to the '+' character?

Are you sure you want to use a regular expression at all, and not just an if statement testing for space characters? You say 'runtime'. Is your data in a string, or coming on a stream, or what?

bmargulies
Cute. I believe what @bmargulies is trying to say is that the + character will match "at least one", "instead of none or more". Which will prevent it from matching an empty string.
GrayWizardx
Yes, because i want to use the scanner as a runtime lexer. In short, I want to be able to ask `scanner.next(pattern)`, that would either return the matched string, or return an exception while not consuming the stream. Spaces should be ignored. If there is a better class to do this than scanner, I would be glad to use it.
Paul Brauner
All joking aside, @trashgod's answer is probably what want. You really haven't given us enough background to go on here.
bmargulies
Good point about more background. FWIW, I added a link to an example.
trashgod
+1  A: 

You might also consider StreamTokenizer. Here is an example of using it for one-symbol look-ahead in a recursive-descent parser.

trashgod
+1  A: 

Yes, because i want to use the scanner as a runtime lexer. In short, I want to be able to ask scanner.next(pattern), that would either return the matched string, or return an exception while not consuming the stream. Spaces should be ignored. If there is a better class to do this than scanner, I would be glad to use it.

I cannot think of any off-the-shelf library class that will do this for you. The normal model of a scanner / lexer is that any invalid character sequence (i.e. one that results in an exception) will be consumed. So, I think you are going to have to implement your own scanner by hand, taking care to treat the read-ahead characters as unconsumed. You could do this with a "pushback" reader or (if that model is not convenient) by explicitly buffering the characters yourself with some kind of mark / reset model. If all you are doing is splitting into tokens separated by one or more spaces, then the pushback reader approach should be fine.

Stephen C