views:

150

answers:

3
System.out.println(
    Arrays.deepToString(
        "abc<def>ghi".split("(?:<)|(?:>)")
    )
);

This prints [abc, def, ghi], as if I had split on "<|>". I want it to print [abc, <def>, ghi]. Is there a way to work some regex magic to accomplish what I want here?


Perhaps a simpler example:

System.out.println(
    Arrays.deepToString(
        "Hello! Oh my!! Good bye!!".split("(?:!+)")
    )
);

This prints [Hello, Oh my, Good bye]. I want it to print [Hello!, Oh my!!, Good bye!!]. `.

+1  A: 

You can use \b (word boundary) as what to look for as it is zero-width and use that as the anchor for looking for < and >.

String s = "abc<def>ghi";
String[] bits = s.split("(?<=>)\\b|\\b(?=<)");
for (String bit : bits) {
  System.out.println(bit);
}

Output:

abc
<def>
ghi

Now that isn't a general solution. You will probably need to write a custom split method for that.

Your second example suggests it's not really split() you're after but a regex matching loop. For example:

String s = "Hello! Oh my!! Good bye!!";
Pattern p = Pattern.compile("(.*?!+)\\s*");
Matcher m = p.matcher(s);
while (m.find()) {
  System.out.println("[" + m.group(1) + "]");
}

Output:

[Hello!]
[Oh my!!]
[Good bye!!]
cletus
Would you kindly comment on my answer see if there's anything wrong with it? Thanks.
polygenelubricants
+2  A: 

You need to take a look at zero width matching constructs:

(?=X)   X, via zero-width positive lookahead
(?!X)   X, via zero-width negative lookahead
(?<=X)  X, via zero-width positive lookbehind
(?<!X)  X, via zero-width negative lookbehind
Cine
Yes, I misunderstood what `(?:`...`)` is for.
polygenelubricants
A: 

Thanks to information from Cine, I think these are the answers I'm looking for:

System.out.println(
    Arrays.deepToString(
        "abc<def>ghi<x><x>".split("(?=<)|(?<=>)")
    )
); // [abc, <def>, ghi, <x>, <x>]


System.out.println(
    Arrays.deepToString(
        "Hello! Oh my!! Good bye!! IT WORKS!!!".split("(?<=!++)")
    )
); // [Hello!,  Oh my!!,  Good bye!!,  IT WORKS!!!]

Now, the second one was honestly discovered by experimenting with all the different quantifiers. Neither greedy nor reluctant work, but possessive does.

I'm still not sure why.

polygenelubricants
Your second example isn't supposed to work. :-/ It should throw a PatternSyntaxException because the lookbehind has no obvious maximum length. That your regex compiles is a bug; that it *works* is mind boggling--and not to be relied on. Here's what you should be using: `(?<=!)(?!!)`. That will work in any regex flavor that supports lookaheads and lookbehinds.
Alan Moore
"That your regex compiles is a bug" .... so, what do we do about it???
polygenelubricants
The bug has been reported, if that's what you mean. I would advise you not to get into the habit of using variable-width expressions in lookbehinds in any case; very few regex flavors support that capability, and there's usually a better way anyway.
Alan Moore