tags:

views:

129

answers:

5

How can I write the regexp to match multiple words in random order?

For example, let's assume the following lines:

Dave Imma Car Pom Dive
Dive Dome Dare
Imma Car Ryan
Pyro Dave Imma Dive
Lunar Happy Dave

I want to search the string for the one matching "Dave" "Imma" and "Dive", expecting the 1st and 4th line. Is this possible?

A: 
if  ((matches "/(Dave|Imma|Dive) (Dave|Imma|Dive) (Dave|Imma|Dive)/")
 && (contains("Dave")) && (contains("Imma")) && (contains("Dive")))
{
    // this will work in 90% of cases.
}

I don't think it's possible to do this exactly, though. Sorry.

Coronatus
A: 
String[] lines = fullData.split("\n");
String[] names = {"Dave", "Imma", "Dive"};
ArrayList matches = new ArrayList();

for(int i=0; i<lines.size(); i++){
    for(String name : names){
        // If any of the names in the list isn't found
        // then this line isn't a match
        if(!lines[i].contains(name)){
            continue;
        }
    }
    // If we made it this far, all of the names were found
    matches.add(i);
}
// matches now contains {1, 4}

If you don't need to know where the matches are, it can be simplified to:

String[] lines = fullData.split("\n");
String[] names = {"Dave", "Imma", "Dive"};

for(String line : lines){
    for(String name : names){
        // If any of the names in the list isn't found
        // then this line isn't a match
        if(!line.contains(name)){
            continue;
        }
    }
    // If we made it this far, all of the names were found

    // Do something
}
Brendan Long
+1  A: 

If you insist on doing this with regex, you can use lookahead:

s.matches("(?=.*Dave)(?=.*Imma)(?=.*Dive).*")

Regex is not the most efficient way of doing this, though.

polygenelubricants
A: 

Should the following lines match?

Dave Imma Dave
Dave Imma Dive Imma

I'm guessing the first one shouldn't because it doesn't contain all three names, but are duplicates okay? If not, this regex does the trick:

^(?:\b(?:(?!(?:Dave|Imma|Dive)\b)\w+[ \t]+)*(?:Dave()|Imma()|Dive())[ \t]*){3}$\1\2\3

I use the word "trick" advisedly. :) This proves that a regex can do the job, but I wouldn't expect to see this regex in any serious application. You'd be much better off writing a method for this purpose.

(By the way, if duplicates are allowed, just remove the $.)

EDIT: Another question: should the names be matched only in the form of complete words? In other words, should these lines match?

DaveCar PomDive Imma
DaveImmaDive

So far, the only other answer that enforces both uniqueness and complete words is Coronatus's, and it fails to match lines with extra words, like these:

Dave Imma Car Pom Dive
Pyro Dave Imma Dive
Alan Moore
+1  A: 

in *nix, you can use awk

if its in order

awk '/Dave.*Imma.*Dive/' file

if its not in order

awk '/Dave/ && /Imma/ && /Dive/' file
ghostdog74