tags:

views:

73

answers:

6

In a program I'm making in python and I want all words formatted like __word__ to stand out. How could I search for words like these using a regex?

+1  A: 

Take a squizz here: http://docs.python.org/library/re.html

That should show you syntax and examples from which you can build a check for word(s) pre- and post-pended with 2 underscores.

glasnt
A: 
Daniel
this sounds too greedy
Matt Joiner
`__(.+?)__` perhaps
Adam Bernier
Daniel - on `hello __world__ lets eat __pizza__`, your regex will capture `__world__ lets eat __pizza__`.
Kobi
+4  A: 

Perhaps something like

\b__(\S+)__\b

>>> import re
>>> re.findall(r"\b__(\S+)__\b","Here __is__ a __test__ sentence")
['is', 'test']    
>>> re.findall(r"\b__(\S+)__\b","__Here__ is a test __sentence__")
['Here', 'sentence']
>>> re.findall(r"\b__(\S+)__\b","__Here's__ a test __sentence__")
["Here's", 'sentence']

or you can put tags around the word like this

>>> print re.sub(r"\b(__)(\S+)(__)\b",r"<b>\2<\\b>","__Here__ is a test __sentence__")
<b>Here<\b> is a test <b>sentence<\b>

If you need more fine grained control over the legal word characters it's best to be explicit

\b__([a-zA-Z0-9_':])__\b  ### count "'" and ":" as part of words

>>> re.findall(r"\b__([a-zA-Z0-9_']+)__\b","__Here's__ a test __sentence:__")
["Here's"]
>>> re.findall(r"\b__([a-zA-Z0-9_':]+)__\b","__Here's__ a test __sentence:__")
["Here's", 'sentence:']
gnibbler
This one worked perfectly for my needs.
Bob Dylan
`\S` will match any non space character (including symbols), so `.__+__.` will be matched.
Amarghosh
@Amarghosh, the OP didn't specify what "word" means, so I interpreted it as a string of non whitespace characters. Of course you could use `\w` instead of `\S`, but then words like "__Here's__" would be broken
gnibbler
A: 

This will give you a list with all such words

>>> import re
>>> m = re.findall("(__\w+__)", "What __word__ you search __for__")
>>> print m
['__word__', '__for__']
kepkin
A: 
\b(__\w+__)\b

\b word boundary
\w+ one or more word characters - [a-zA-Z0-9_]

Amarghosh
A: 

simple string functions. no regex

>>> mystring="blah __word__ blah __word2__"
>>> for item in mystring.split():
...     if item.startswith("__") and item.endswith("__"):
...        print item
...
__word__
__word2__
ghostdog74