tags:

views:

234

answers:

4

In Java, I'm using the String split method to split a string containing values separated by semicolons.

Currently, I have the following line that works in 99% of all cases.

String[] fields = optionsTxt.split(";");

However, the requirement has been added to include escaped semicolons as part of the string. So, the following strings should parse out to the following values:

"Foo foo;Bar bar" => [Foo foo] [Bar bar]
"Foo foo\; foo foo;Bar bar bar" => [Foo foo\; foo foo] [Bar bar bar]

This should be painfully simple, but I'm totally unsure about how to go about it. I just want to not tokenize when there is a \; and only tokenize when there is a ;.

Does anyone out there know the magic formula?

+1  A: 

There's probably a better way but the quick-and-dirty method would be to first replace \; with some string that won't appear in your input buffers, like {{ESCAPED_SEMICOLON}}, then do the tokenize on ;, and then when you pull out each token do the original substitution in reverse to put back the \;

bdk
this is more simple than using regex. :)
nightingale2k1
That is really hacky and wont work if you can escape backslashes. if you have \\\\\\; (meaning you want to have 3 backslashes, and split on the semicolon, you won't end up with the right result because of replacement.
Tom
+2  A: 

try this:

String[] fields = optionsTxt.split("(?<!\\\\);");
objects
+1  A: 

Using a regular expression (java.util.regex)

[^\\];

should be what you are looking for without doing a double replace.

try it out using a tool like this

Jason w
That regex consumes the character preceding the semicolon as well as the semicolon itself. If you split on that, all but the final token will have their last character chopped off.
Alan Moore
A: 

Using only your provided examples, you can use objects' code from above. If you want the split to happen only when there's an even number of backslashes before your semi-colon, try this:

String[] fields = optionsTxt.split("((?<!\\\\)|(?<=[^\\\\](\\\\\\\\){0,15}));");

I've picked 15 arbitrarily. Change it to a higher number if need be.

lins314159