views:

364

answers:

3

How do you set the delimiter for a scanner to either ; or new line?

I tried: Scanner.useDelimiter(Pattern.compile("(\n)|;")); But it doesn't work.

+3  A: 

As a general rule, in patterns, you need to double the \.

So... try Scanner.useDelimiter(Pattern.compile("(\\n)|;")); or Scanner.useDelimiter(Pattern.compile("[\\n;]"));

Edit: If \r\n is the problem, you might want to try this:

Scanner.useDelimiter(Pattern.compile("[\\r\\n;]+"));

which matches one or more of \r, \n, and ;.

Note: I haven't tried these.

R. Bemrose
You can go either way. If you use two backslashes, the regex compiler sees `\n` and interprets it as the escape sequence for a linefeed. If you use one backslash, the regex compiler sees an actual linefeed character, which it matches literally. But I would definitely go with the character-class version: `"[\\n;]"` or `"[\n;]"`; it's easier to read as well as more efficient.
Alan Moore
@Alan Moore: Ah, OK... I just assumed that a literal line break would be misinterpreted.
R. Bemrose
+1  A: 

Looking at the OP's comment, it looks like it was a different line ending (\r\n or CRLF) that was the problem.

Here's my answer, which would handle multiple semicolons and line endings in either format (may or may not be desired)

Scanner.useDelimiter(Pattern.compile("([\n;]|(\r\n))+"));

e.g. an input file that looks like this:

1


2;3;;4
5

would result in 1,2,3,4,5

I tried normal \n and \\n - both worked in my case, though I agree if you need a normal backslash you would want to double it as it is an escape character. It just so happens that in this case, "\n" becomes the desired character with or without the extra '\'

Joshua McKinnon
+3  A: 

As you've discovered, you needed to look for DOS/network style \r\n (CRLF) line separators instead of the Unix style \n (LF only). But what if the text contains both? That happens a lot; in fact, when I view the source of this very page I see both varieties.

You should get in the habit of looking for both kinds of separator, as well as the older Mac style \r (CR only). Here's one way to do that:

\r?\n|\r

Plugging that into your sample code you get:

scanner.useDelimiter(";|\r?\n|\r");

This is assuming you want to match exactly one newline or semicolon at a time. If you want to match one or more you can do this instead:

scanner.useDelimiter("[;\r\n]+");

Notice, too, how I passed in a regex string instead of a Pattern; all regexes get cached automatically, so pre-compiling the regex doesn't get you any performance gain.

Alan Moore