tags:

views:

464

answers:

4

I have a multiline string which is delimited by a set of different delimiters:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)

I can split this string into its parts, using String.split, but it seems that I can't get the actual string, which matched the delimiter regex.

In other words, this is what I get:

  • Text1
  • Text2
  • Text3
  • Text4

This is what I want

  • Text1
  • DelimiterA
  • Text2
  • DelimiterC
  • Text3
  • DelimiterB
  • Text4

Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?

A: 

I don't think it is possible with String#split, but you can use a StringTokenizer, though that won't allow you to define your delimiter as a regex, but only as a class of single-digit characters:

new StringTokenizer("Hello, world. Hi!", ",.!", true); // true for returnDelims
Fabian Steeg
There I can't define a regex to specify my delimiters.
DR
StringTokenizer only allows for single-character delimiters, though.
Michael Borgwardt
@DR @Michael: True, edited to clarify
Fabian Steeg
+19  A: 

You can use Lookahead and Lookbehind. Like this:

System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));

And you will get:

[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

Hope this helps.

EDIT Fabian Steeg comments on Readability is valid. Readability is always the problem for RegEx. One thing, I do to help easing this is to create a variable whose name represent what the regex does and use Java String format to help that. Like this:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
...
public void someMethod() {
...
final String[] aEach = "a;b;c;d".split(WITH_DELIMITER.format(";"));
...
}
...

This helps a little bit. :-D

NawaMan
Very nice! Here we can see again the power of regular expressions!!
George B.
I can't believe it worked. It almost looks like magic :)
DR
Nice to see there is a way to do this with String#split, though I wish there was a way to include the delimiters as there was for the StringTokenizer - `split(";", true)` would be so much more readable than `split("((?<=;)|(?=;))")`.
Fabian Steeg
Nice addition for improving readability!
Fabian Steeg
You have to use `%1$s` instead of `%1`.
DR
Hahaha... DR, you are right. Lately I am doing a lot of RegEx in JavaScript so it is kind of mix up. :-p
NawaMan
A: 

I suggest using Pattern and Matcher, which will almost certainly achieve what you want. Your regular expression will need to be somewhat more complicated than what you are using in String.split.

Steve McLeod
+1  A: 

A very naive solution, that doesn't involve regex would be to perform a string replace on your delimiter along the lines of (assuming comma for delimiter):

string.replace(FullString, "," , "~,~")

Where you can replace tilda (~) with an appropriate unique delimiter.

Then if you do a split on your new delimiter then i believe you will get the desired result.

chillysapien