tags:

views:

1216

answers:

4

So regular expressions seem to match on the longest possible match. For instance:

public static void main(String[] args) {
 String s = "ClarkRalphKentGuyGreenGardnerClarkSupermanKent";
 Pattern p = Pattern.compile("Clark.*Kent", Pattern.CASE_INSENSITIVE);
 Matcher myMatcher = p.matcher(s);
 int i = 1;
 while (myMatcher.find()) {
  System.out.println(i++ + ". " + myMatcher.group());
 }
}

generates output

  1. ClarkRalphKentGuyGreenGardnerClarkSupermanKent

I would like this output

  1. ClarkRalphKent
  2. ClarkSupermanKent

I have been trying Patterns like:

 Pattern p = Pattern.compile("Clark[^((Kent)*)]Kent", Pattern.CASE_INSENSITIVE);

that don't work, but you see what I'm trying to say. I want the string from Clark to Kent that doesn't contain any occurrences of Kent.

This string:

ClarkRalphKentGuyGreenGardnerBruceBatmanKent

should generate output

  1. ClarkRalphKent
+5  A: 

greedy vs reluctant is your friend here.

try: Clark.+?Kent

Gareth Davis
+2  A: 

Use the relunctant ? suffix: Clark.*?Kent The quantifiers ?, *, + can be followed by ? to indicate that they should stop as soon as possible.

see http://perldoc.perl.org/perlre.html

Adrian Pronk
see http://perldoc.perl.org/perlre.html
Adrian Pronk
+4  A: 

You want a "reluctant" rather than a "greedy" quantifier. Simply putting a ? after your * should do the trick.

Michael Borgwardt
+2  A: 

When you tried "Clark[^((Kent)*)]Kent", I think you were wanting "Clark((?!Kent).)*Kent" for zero-width negative look-ahead (scroll down a bit to the "Look-Around Assertions" header).

Brackets specify character matching vs. pattern matching. So, the RegExp was trying to find a single character not in (, K, e, n, t, ), *.

Jonathan Lonowski