views:

84

answers:

3
public class PatternTest {
  public static void main(String[] args) {
    System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.)"));
  }
}

This program prints "false". What?!

I am expecting to match the prefix of the string: "117_117_0009v0_1"

I know this stuff, really I do... but for the life of me, I've been staring at this for 20 minutes and have tried every variation I can think of and I'm obviously missing something simple and obvious here.

Hoping the many eyes of SO can pick it out for me before I lose my mind over this.

Thanks!


The final working version ended up as:

String text = "117_117_0009v0_172_5738_5740";
String regex = "[0-9_]+v._.";

Pattern p = Pattern.compile(regex);

Mather m = p.matcher(text);
if (m.lookingAt()) {
  System.out.println(m.group());
}

One non-obvious discovery/reminder for me was that before accessing matcher groups, one of matches() lookingAt() or find() must be called. If not an IllegalStateException is thrown with the unhelpful message "Match not found". Despite this, groupCount() will still return non-zero, but it lies. Do not beleive it.

I forgot how ugly this API is. Argh...

A: 

I donno Java Flavor of Regular Expression However This PCRE Regular Expression Should work ^([\d_]+v\d_\d).+ Dont know why you are using ._. instead of \d_\d

Because I don't have any guarantee that those will always be numbers.
Mark Renouf
Then use `^([\d_]+v._.).+` as you did before . It should work.it least \d version worked here.otherwise try `^([\d_]+v[^_]_[^_]).+`
+3  A: 

by default Java sticks in the ^ and $ operators, so something like this should work:

public class PatternTest {
  public static void main(String[] args) {
    System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.).*$"));
  }
}

returns:

true

Match content:

117_117_0009v0_1

This is the code I used to extract the match:

       Pattern p = Pattern.compile("^([0-9_]+v._.).*$");
       String str = "117_117_0009v0_172_5738_5740";

        Matcher m = p.matcher(str);
        if (m.matches())
        {
            System.out.println(m.group(1));
        }
npinti
Ok. dumb question but how did you access the match content? I ended up with the same pattern you have there, which does return true for the code you show. But I've tried .group() .group(0), etc... and all throw an IllegalStateException("No Match Found"). Yet groupCount() returns 1 ?!?!
Mark Renouf
Regular expression groups start with 1 not 0, I know this can be confusing until you get the hang of it :)
npinti
I know that part. Group 0 should be the entire matched section. Group 1 and higher are the matched text of any capturing groups in the pattern. The docs say that calling group(n) is valid for any n >= groupCount(). I finally figured out you must call find() first before trying to access groups? Very confusing. I guess I've just never fully used this API before.Seems you have to make at least one call to .find() first? Wow. Confusing API. I guess it's possible I've only ever used .matches(), though I could've sworn it was just .matcher(....).group().
Mark Renouf
I have added the code I used to extract the match. You should always check if there are any matches, it is more elegant, to my opinion, than catching exceptions.
npinti
+1  A: 

If you want to check if a string starts with the certain pattern you should use Matcher.lookingAt() method:

Pattern pattern = Pattern.compile("([0-9_]+v._.)");
Matcher matcher = pattern.matcher("117_117_0009v0_172_5738_5740");
if (matcher.lookingAt()) {
  int groupCount = matcher.groupCount();
  for (int i = 0; i <= groupCount; i++) {
     System.out.println(i + " : " + matcher.group(i));
  }
}

Javadoc:

boolean java.util.regex.Matcher.lookingAt()

Attempts to match the input sequence, starting at the beginning of the region, against the pattern. Like the matches method, this method always starts at the beginning of the region; unlike that method, it does not require that the entire region be matched. If the match succeeds then more information can be obtained via the start, end, and group methods.

Vitalii Fedorenko
Thanks, this is also helpful and closer to what I wanted to do.Most of my trouble came from it not being clear that a call to something like lookingAt() matches() or find() must precede attempting to access group(n). The exception message also is not very helpful in communicating this fact. Add to this that groupCount() always returns one before you've actually applied the pattern as mentioned above, implying that you should be able to access that group.
Mark Renouf