tags:

views:

92

answers:

4

Does this regex have one or two groups?

I'm trying to access the bookTitle using the second group but getting an error:

Pattern pattern = Pattern.compile("^\\s*(.*?)\\s+-\\s+'(.*)'\\s*$");
Matcher matcher = pattern.matcher("William Faulkner - 'Light In August'");
String author = matcher.group(1).trim();
String bookTitle = matcher.group(2).trim();
+4  A: 

Two groups -- ' is not a special character in regexes. What is the error you're getting?

Also, they ARE NOT zero-based. From the javadoc:

Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().

bemace
Just tested your regex on my computer and it works for me
bemace
+2  A: 

Add one of following before you ask groups.

 matcher.find();
 matcher.maches();

How this works:

A matcher is created from a pattern by invoking the pattern's matcher method. Once created, a matcher can be used to perform three different kinds of match operations:

The matches method attempts to match the entire input sequence against the pattern.

The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.

The find method scans the input sequence looking for the next subsequence that matches the pattern.

Source: Java Api

I personally recommend you remove multiple whitespace first, then split and trim - viola simple, tested, and works.

Try this:

    String s = "William          Faulkner - 'Light In August'";
    String o[] = s.replaceAll("\\s+", " ").split("-");
    String author = o[0].trim();
    String bookTitle = o[1].trim();

If you would:

    System.out.println(author);
    System.out.println(bookTitle);

Then output would be:

William Faulkner
'Light In August'
Margus
+3  A: 

There are TWO groups but the error is because nothing is being done with the matcher.
There is a IllegalStateException being thrown when trying to get the first group at matcher.group(1).
One of the methods matches, lookingAt or find must be called.
This should do:

Pattern pattern = Pattern.compile("^\\s*(.*?)\\s+-\\s+'(.*)'\\s*$");
Matcher matcher = pattern.matcher("William Faulkner - 'Light In August'");
if (matcher.matches()) {
    String author = matcher.group(1).trim();
    String bookTitle = matcher.group(2).trim();
    ...
} else {
    // not matched, what now?
}
Carlos Heuberger
+1  A: 

The problem is that the Matcher class seems to be lazy: it actually postpones the evaluation until the matches() method is called. Try this instead

Pattern pattern = Pattern.compile("^\\s*(.*)\\s+-\\s+'(.*)'\\s*$");
Matcher matcher = pattern.matcher("William Faulkner - 'Light In August'");

if (matcher.matches()) {
    String author = matcher.group(1).trim();
    String bookTitle = matcher.group(2).trim();

    System.out.println(author + " / " + bookTitle);
}
else {
   System.out.println("No match!");
}

You also might want to change the groups to (.+) to make sure that you will not get books with empty authors/titles.

teto