tags:

views:

95

answers:

4

I need to be able to split an input String by commas, semi-colons or white-space (or a mix of the three). I would also like to treat multiple consecutive delimiters in the input as a single delimiter. Here's what I have so far:

String regex = "[,;\\s]+";    
return input.split(regex);

This works, except for when the input string starts with one of the delimiter characters, in which case the first element of the result array is an empty String. I do not want my result to have empty Strings, so that something like, ",,,,ZERO; , ;;ONE ,TWO;," returns just a three element array containing the capitalized Strings.

Is there a better way to do this than stripping out any leading characters that match my reg-ex prior to invoking String.split?

Thanks in advance!

+4  A: 

No, there isn't. You can only ignore trailing delimiters by providing 0 as a second parameter to String's split() method:

return input.split(regex, 0);

but for leading delimiters, you'll have to strip them first:

return input.replaceFirst("^"+regex, "").split(regex, 0);
Bart Kiers
A negative parameter? `If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.` From http://java.sun.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String,%20int%29
Mark Byers
Whoops, yes, I meant 0. Thanks!
Bart Kiers
+1 for fixing it :)
Mark Byers
+2  A: 

If by "better" you mean higher performance then you might want to try creating a regular expression that matches what you want to match and using Matcher.find in a loop and pulling out the matches as you find them. This saves modifying the string first. But measure it for yourself to see which is faster for your data.

If by "better" you mean simpler, then no I don't think there is a simpler way than the way you suggested: removing the leading separators before applying the split.

Mark Byers
A: 

You could also potentially use StringTokenizer to build the list, depending what you need to do with it:

StringTokenizer st = new StringTokenizer(",,,ZERO;,ONE TWO", ",; ", false);
while(st.hasMoreTokens()) {
  String str = st.nextToken();
  //add to list, process, etc...
}

As a caveat, however, you'll need to define each potential whitespace character separately in the second argument to the constructor.

mtruesdell
A: 

Pretty much all splitting facilities built into the JDK are broken one way or another. You'd be better off using a third-party class such as Splitter, which is both flexible and correct in how it handles empty tokens and whitespaces:

Splitter.on(CharMatcher.anyOf(";,").or(CharMatcher.WHITESPACE))
    .omitEmptyStrings()
    .split(",,,ZERO;,ONE TWO");

will yield an Iterable<String> containing "ZERO", "ONE", "TWO"

Julien Silland