tags:

views:

227

answers:

5

I'm trying to find all the occurrences of "Arrows" in text, so in

"<----=====><==->>"

the arrows are:

"<----", "=====>", "<==", "->", ">"

This works:

 String[] patterns = {"<=*", "<-*", "=*>", "-*>"};
    for (String p : patterns) {
      Matcher A = Pattern.compile(p).matcher(s);
       while (A.find()) {
        System.out.println(A.group());
      }         
    }

but this doesn't:

      String p = "<=*|<-*|=*>|-*>";
      Matcher A = Pattern.compile(p).matcher(s);
       while (A.find()) {
        System.out.println(A.group());
      }

No idea why. It often reports "<" instead of "<====" or similar.

What is wrong?

A: 

for <======= you need <=+ as the regex. <=* will match zero or more ='s which means it will always match the zero case hence <. The same for the other cases you have. You should read up a bit on regexs. This book is FANTASTIC: Mastering Regular Expressions

ennuikiller
He said, that he considers > an arrow so using * and not + is just what he needs
Jens Schauder
read the other answers, and now I understand what you mean (I think)
Jens Schauder
A: 

Your provided regex pattern String does work for your example: "<----=====><==->>"

 String p = "<=*|<-*|=*>|-*>";
 Matcher A = Pattern.compile(p).matcher(s);
   while (A.find()) {
           System.out.println(A.group());
                   }

However it is broken for some other examples pointed out in the answers such as input string "<-" yields "<", yet strangely "<=" yields "<=" as it should.

Sean A.O. Harney
My code snippet above is in response to OPs second one, I was just pointing out that his supplied test case did come up with the correct response for that one case. But not for other supplied inputs.
Sean A.O. Harney
+6  A: 

Solution

The following program compiles to one possible solution to the question:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class A {
  public static void main( String args[] ) {
    String p = "<=+|<-+|=+>|-+>|<|>";
    Matcher m = Pattern.compile(p).matcher(args[0]);
    while (m.find()) {
      System.out.println(m.group());
    }
  }
}

Run #1:

$ java A "<----=====><<---<==->>==>"
<----
=====>
<
<---
<==
->
>
==>

Run #2:

$ java A "<----=====><=><---<==->>==>"
<----
=====>
<=
>
<---
<==
->
>
==>

Explanation

An asterisk will match zero or more of the preceding characters. A plus (+) will match one or more of the preceding characters. Thus <-* matches < whereas <-+ matches <- and any extended version (such as <--------).

Dave Jarvis
can you explain why ulver's pattern wasnt working?
akf
I got that, but I was under the impression that * was greedy by default, so =* should match as many ='s as possible. They worked in the pattern array example.
akf
have a look at Kevin Petersons explanation
Jens Schauder
It's more accurate to say that regex **quantifiers** are greedy (by default, that is). Alternation isn't; it checks each alternative in the order they're written and goes with the first one that works.
Alan Moore
It works. All tests I had pass. I wonder if there is a cleaner way to say "make alteration greedy", but oh well. Thanks for the correction.
ulver
+5  A: 

When you match "<=*|<-*|=*>|-*>" against the string "<---", it matches the first part of the pattern, "<=*", because * includes zero or more. Java matching is greedy, but it isn't smart enough to know that there is another possible longer match, it just found the first item that matches.

Kevin Peterson
+1  A: 

Your first solution will match everything that you are looking for because you send each pattern into matcher one at a time and they are then given the opportunity to work on the target string individually.

Your second attempt will not work in the same manner because you are putting in single pattern with multiple expressions OR'ed together, and there are precedence rules for the OR'd string, where the leftmost token will be attempted first. If there is a match, no matter how minimal, the get() will return that match and continue on from there.

See Thangalin's response for a solution that will make the second work like the first.

akf