Java Scanner with empty delimiter

views:

answers:

+1 Q:

Java Scanner with empty delimiter

I'd like to parse some text using an hand-written descending parser. I used Scanner with the following delimiter : "\\s*". Unfortunately, the fact that this pattern matches an empty String seems to make every hasNextFoo and nextFoo matching nothing anymore.

The doc doesn't say anything about possibly empty delimitors.

You have some objection to the '+' character?

Are you sure you want to use a regular expression at all, and not just an if statement testing for space characters? You say 'runtime'. Is your data in a string, or coming on a stream, or what?

bmargulies 2009-12-25 23:14:27

Cute. I believe what @bmargulies is trying to say is that the + character will match "at least one", "instead of none or more". Which will prevent it from matching an empty string.

GrayWizardx 2009-12-25 23:21:51

Yes, because i want to use the scanner as a runtime lexer. In short, I want to be able to ask `scanner.next(pattern)`, that would either return the matched string, or return an exception while not consuming the stream. Spaces should be ignored. If there is a better class to do this than scanner, I would be glad to use it.

Paul Brauner 2009-12-25 23:29:42

All joking aside, @trashgod's answer is probably what want. You really haven't given us enough background to go on here.

bmargulies 2009-12-25 23:51:14

Good point about more background. FWIW, I added a link to an example.

trashgod 2009-12-27 00:19:23

+1 A:

You might also consider StreamTokenizer. Here is an example of using it for one-symbol look-ahead in a recursive-descent parser.

trashgod 2009-12-25 23:32:49

+1 A:

Yes, because i want to use the scanner as a runtime lexer. In short, I want to be able to ask scanner.next(pattern), that would either return the matched string, or return an exception while not consuming the stream. Spaces should be ignored. If there is a better class to do this than scanner, I would be glad to use it.

I cannot think of any off-the-shelf library class that will do this for you. The normal model of a scanner / lexer is that any invalid character sequence (i.e. one that results in an exception) will be consumed. So, I think you are going to have to implement your own scanner by hand, taking care to treat the read-ahead characters as unconsumed. You could do this with a "pushback" reader or (if that model is not convenient) by explicitly buffering the characters yourself with some kind of mark / reset model. If all you are doing is splitting into tokens separated by one or more spaces, then the pushback reader approach should be fine.

Stephen C 2009-12-26 00:32:02

ansaurus

tags:

views:

answers:

Java Scanner with empty delimiter

related questions