tags:

views:

912

answers:

4

I want to reduce the number of patterns I have to write by using a regex that picks up any or all of the pattern when it appears in a string.

Is this possible with Regex?

E.g. Pattern is: "the cat sat on the mat"

I would like pattern to match on following strings:
"the"
"the cat"
"the cat sat"
...
"the cat sat on the mat"

But it should not match on the following string because although some words match, they are split by a non matching word: "the dog sat"

+1  A: 

If you know the match always begins at the first character, it would be much faster to match the characters directly in a loop. I don't think Regex will do it anyway.

Ray Hidayat
+1  A: 

It could be fairly complicated:

(?ms)the(?=(\s+cat)|[\r\n]+)(:?\s+cat(?=(\s+sat)|[\r\n]+))?(:?\s+sat(?=(\s+on)|[\r\n]+))?(:?\s+on(?=(\s+the)|[\r\n]+))?(:?\s+the(?=(\s+mat)|[\r\n]+))?(:?\s+mat)?[\r\n]+

Meaning:

  • I want "the" only if followed by "cat" or end of line
  • then I want "cat" (optional) only if followed by "sat"
  • and so one
  • followed by and end of line (which ensure to not match partial "the cat walk...")

It does match

the cat sat on the mat
the cat
the cat sat
the cat sat aa on the mat (nothing is match either)
the dog sat (nothing is matched there)


On second thought, Tomalak's answer is simpler (if fixed, that is ended with a '$').
I keep mine as a wiki post.

VonC
Thanks for the tip, factored in! :-) If no post-condition can be defined, your regex is still the only way to do it. Switching to wiki mode might have been a bit premature. ;-)
Tomalak
+6  A: 

This:

the( cat( sat( on( the( mat)?)?)?)?)?

would answer your question. Remove "optional group" parens "(...)?" for parts that are not optional, add additional groups for things that must match together.

the                       // complete match
the cat                   // complete match
the cat sat               // complete match
the cat sat on            // complete match
the cat sat on the        // complete match
the cat sat on the mat    // complete match
the dog sat on the mat    // two partial matches ("the")

You might want to add some pre-condition, like a start of line anchor, to prevent the expression from matching the second "the" in the last line:

^the( cat( sat( on( the( mat)?)?)?)?)?

EDIT: If you add a post-condition, like the end-of-line anchor, matching will be prevented entirely on the last example, that is, the last example won't match at all:

the( cat( sat( on( the( mat)?)?)?)?)?$

Credits for the tip go to VonC. Thanks!

The post-condition may of course be something else you expect to follow the match.

Alternatively, you remove the last question mark:

the( cat( sat( on( the( mat)?)?)?)?)

Be aware though: This would make a single "the" a non-match, so the first line will also not match.

Tomalak
-1: would have partially matches in "the dog sat on the mat", whereas the user explicitly required NO match...
VonC
Ok... I cancel my -1;) If you add a '$' at the end of your regexp, it works :)
VonC
I asked him if he requires a non-match, I don't think he was too explicit about it, but if he does, your solition is, though complex, the only way to do it.
Tomalak
A: 

Hi,

Perhaps it would be easier and more logical to think about the problem a little differently..

Instead of matching the pattern against the string.... how about using the string as the pattern and looking for it in the pattern.

For example where

string = "the cat sat on" pattern = "the cat sat on the mat"

string is always a subset of pattern and is simply a case of doing a regex match.

If that makes sense ;-)

DEzra
And what if the pattern is "the (cat|dog|bird|fly) (sat|stood) on the mat"?
Tomalak