ansaurus

Question

python regex question

Answer 1

+3 A:

Can you try:

if 'hello' in longtext:

or

if 'HELLO' in longtext.upper():

to match hello/Hello/HELLO.

eumiro 2010-10-18 18:40:36

or hELLo or HElLO or .... ;)

KevinDTimm 2010-10-18 18:54:55

... hElLo or hellO or...

Santiago Lezica 2010-10-18 19:02:52

Answer 2

+2 A:

If you are trying to check 'hello' or a complete word in a string, you could also do

if 'hello' in stringToMatch:
    ... # Match found , do something

To find various strings, you could also use find all

>>>toMatch = 'e3e3e3eeehellloqweweemeeeeefe'
>>>regex = re.compile("hello|me",re.IGNORECASE)
>>>print regex.findall(toMatch)
>>>[u'me']
>>>toMatch = 'e3e3e3eeehelloqweweemeeeeefe'
>>>print regex.findall(toMatch)
>>>[u'hello', u'me']
>>>toMtach = 'e3e3e3eeeHelLoqweweemeeeeefe'
>>>print regex.findall(toMatch)
>>>[u'HelLo', u'me']

pyfunc 2010-10-18 18:41:23

that works, however I still need the regex functionality of a returning a group of matches as sometimes the words in the string are uppercase or lowercase

Joe 2010-10-18 18:44:24

@Joe: In that case you could use regex with | statement . See my edited reply

pyfunc 2010-10-18 18:58:02

Answer 3

+3 A:

>>> words = ('hello', 'good\-bye', 'red', 'blue')
>>> pattern = re.compile('(' + '|'.join(words) + ')', re.IGNORECASE)
>>> sentence = 'SAY HeLLo TO reD, good-bye to Blue.'
>>> print pattern.findall(sentence)
['HeLLo', 'reD', 'good-bye', 'Blue']

Steven Rumbalski 2010-10-18 19:52:56

+1 Good answer. However, I think it's also important to point out word-boundary conditions/options available.

pst 2010-10-18 21:32:50

Answer 4

+2 A:

You say you want to search for WORDS. What is your definition of a "word"? If you are looking for "meet", do you really want to match the "meet" in "meeting"? If not, you might like to try something like this:

>>> import re
>>> query = ("meet", "lot")
>>> text = "I'll meet a lot of friends including Charlotte at the town meeting"
>>> regex = r"\b(" + "|".join(query) + r")\b"
>>> re.findall(regex, text, re.IGNORECASE)
['meet', 'lot']
>>>

The \b at each end forces it to match only at word boundaries, using re's definition of "word" -- "isn't" isn't a word, it's two words separated by an apostrophe. If you don't like that, look at the nltk package.

John Machin 2010-10-18 21:29:40

ansaurus

tags:

views:

answers:

python regex question

related questions