views:

364

answers:

3

in this text :

text text text [[st: aaa bbb ccc ddd eee fff]] text text
text text [[st: ggg hhh iii jjj kkk
lll mmm nnn]] text text text

I'm trying to get the text between the [[st: and that ends with ]]

My program should output:

aaa bbb ccc ddd eee fff  (first match)
ggg hhh iii jjj kkk \n lll mmm nnn(second match)

But I can only get it to return the first [[st: and the last ]], so there is just one match instead of two. Any ideas?

Here's my code:

package com.s2i.egc.test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TestRegex {

    /**
     * @param args
     */
    public static void main(String[] args) {

     String bodyText = "text text text [[st: aaa bbb ccc ddd eee fff]] text text text text [[st: ggg hhh iii jjj kkk\n lll mmm nnn]] text text text";

     String currentPattern = "\\[\\[st:.*\\]\\]";

     Pattern myPattern = Pattern.compile(currentPattern, Pattern.DOTALL);

     Matcher myMatcher = myPattern.matcher(bodyText);

     int i = 1;

     while (myMatcher.find()) {
       String match = bodyText.substring(myMatcher.start() + 5, myMatcher.end() - 3);
       System.out.println(match + " (match #" + i + ")");
       i++;
     }       


    }

}
+2  A: 

You should use lazy mode for the asterisk

.*

use instead:

"\\[\\[st:.*?\\]\\]"
Dror
+3  A: 

The quantifier * (0 or more) is greedy by default, so it matches to the second ]].

Try changing to a reluctant pattern match:

String currentPattern = "\\[\\[st:.*?\\]\\]";
Simon Nickerson
Thanks a lot! it worked! :)
mrmuggles
+1  A: 

Just for completeness' sake, without the non-greedy star, you could match the opening [[st:, followed by any non-] characters, possibly including sequences of ] characters followed by non-] characters, finally followed by ]]:

\[\[st:([^\]]*(?:\][^\]]+)*)\]\]
Joel Hoffman