views:

808

answers:

1

I'd like to specify a delimiter for a scanner that splits on some pattern, but doesn't remove that pattern from the tokens. I can't seem to make this work, as anything that is identified by the regex also gets eaten as part of the delimiter. Any suggestions?

My specific problem, I have file that looks like:

text/numbers mix
numbers
numbers
text/numbers mix
numbers
numbers
numbers
.
.

I'd like to split out from the text/numbers mix+rows until the next text/numbers mix. I have the regex to identify them, but as stated, using that as the delimiter eats part of what I want.

EDIT: code addition:

static final String labelRegex="\\s*[^01\\s*]\\w+\\s*";
static final Pattern labelPattern = Pattern.compile(labelRegex, Pattern.MULTILINE);

is the pattern I used to identify the text/numbers bit (I know my numbers rows contain all 1/0s separated by spaces).

When I initialize the scanner:

stateScan = new Scanner(new BufferedReader(new FileReader(source)));
stateScan.useDelimiter(labelPattern);

that eats the labels, and just leaves the rows. I currently have a working implementation that starts two scanners on two buffered file readers from the same source, one splitting by states and the other by labels. I'd really like it to be just one grabbing label+state.

+3  A: 

You can use a positive look ahead in your regex. Look aheads (and behinds) are not included in the match, so they won't be "eaten" by the Scanner. This regex will probably do what you want:

(?=text/numbers)

The delimiter will be the empty String right before the sub-string text/numbers.

Here's a small demo:

public class Main {
    public static void main(String[] args) {
        String text = "text/numbers mix\n"+
                "numbers\n"+
                "numbers\n"+
                "text/numbers mix\n"+
                "numbers\n"+
                "numbers\n"+
                "numbers";
        String regex = "(?=text/numbers)";
        Scanner scan = new Scanner(text).useDelimiter(regex);
        while(scan.hasNext()) {
            System.out.println("------------------------");
            System.out.println(">"+scan.next().trim()+"<");
        }
    }
}

which produces:

------------------------
>text/numbers mix
numbers
numbers<
------------------------
>text/numbers mix
numbers
numbers
numbers<
Bart Kiers
Brilliant, thanks.
Carl
No problem Carl.
Bart Kiers
What I ultimately went with: http://stackoverflow.com/questions/1545022/java-scanner-headache
Carl