views:

65

answers:

3

In python regex how would I match against a large string of text and flag if any one of the regex values are matched... I have tried this with "|" or statements and i have tried making a regex list.. neither worked for me.. here is an example of what I am trying to do with the or..

I think my "or" gets commented out

patterns=re.compile(r'[\btext String1\b] | [\bText String2\b]')   

if(patterns.search(MyTextFile)):
     print ("YAY one of your text patterns is in this file")

The above code always says it matches regardless if the string appears and if I change it around a bit I get matches on the first regex but never checks the second.... I believe this is because the "Raw" is commenting out my or statement but how would I get around this??

I also tried to get around this by taking out the "Raw" statement and putting double slashes on my \b for escaping but that didn't work either :(

patterns=re.compile(\\btext String1\\b | \\bText String2\\b)   

if(patterns.search(MyTextFile)):
     print ("YAY one of your text patterns is in this file")

I then tried to do 2 separate raw statements with the or and the interpreter complains about unsupported str opperands...

patterns=re.compile(r'\btext String1\b' | r'\bText String2\b')   

if(patterns.search(MyTextFile)):
     print ("YAY one of your text patterns is in this file")
+3  A: 
patterns=re.compile(r'(\btext String1\b)|(\bText String2\b)')   

You want a group (optionally capturing), not a character class. Technically, you don't need a group here:

patterns=re.compile(r'\btext String1\b|\bText String2\b')   

will also work (without any capture).

The way you had it, it checked for either one of the characters between the first square brackets, or one of those between the second pair. You may find a regex tutorial helpful.

It should be clear where the "unsupported str operands" error comes from. You can't OR strings, and you have to remember the | is processed before the argument even gets to compile.

Matthew Flaschen
I suspect that you also should remove the whitespace around the `|` since it is significant in the regex, and I doubt the OP is aware of that. Also, you don't need a group at all here, neither capturing nor non-capturing.
Tim Pietzcker
@Tim, yeah I noticed shortly after I posted.
Matthew Flaschen
Sweeeet that worked!!!! I could have sworn I tried thispatterns=re.compile(r'(\btext String1\b)|(\bText String2\b)') THANKS!!! Does anyone have a suggestion of a really good regex tutorial/book etc etc ?? I am slowly learning as I am working on projects..
eyes0cket
Excellent free tutorial at http://www.regular-expressions.info - Book by the same author "Regular Expressions Cookbook" http://oreilly.com/catalog/9780596520694/
Tim Pietzcker
cool I will read those up!!
eyes0cket
A: 

This part [\btext String1\b] means is there a "word separator" or one of the letters in "text String1" present. So that matches anything but an empty line I think.

DiggyF
A: 

In a RE pattern, square brackets [ ] indicate a "character class" (depending on what's inside them, "any one of these character" or "any character except one of these", the latter indicate by a caret ^ as the first character after the opening [). This is what you're expressing and it has absolutely nothing to do with what you want -- just remove the brackets and you should be fine;-).

Alex Martelli
thanks for the info!!! I saw the[] in some example online which looked almost exactly like what I was try to do so i blindly incorporated it LOL Figured it was just a separator between expressions
eyes0cket