In a program I'm making in python and I want all words formatted like __word__
to stand out. How could I search for words like these using a regex?
views:
73answers:
6
+1
A:
Take a squizz here: http://docs.python.org/library/re.html
That should show you syntax and examples from which you can build a check for word(s) pre- and post-pended with 2 underscores.
glasnt
2010-02-16 04:31:02
A:
Daniel
2010-02-16 04:37:25
this sounds too greedy
Matt Joiner
2010-02-16 04:39:49
`__(.+?)__` perhaps
Adam Bernier
2010-02-16 05:05:22
Daniel - on `hello __world__ lets eat __pizza__`, your regex will capture `__world__ lets eat __pizza__`.
Kobi
2010-02-16 05:35:12
+4
A:
Perhaps something like
\b__(\S+)__\b
>>> import re
>>> re.findall(r"\b__(\S+)__\b","Here __is__ a __test__ sentence")
['is', 'test']
>>> re.findall(r"\b__(\S+)__\b","__Here__ is a test __sentence__")
['Here', 'sentence']
>>> re.findall(r"\b__(\S+)__\b","__Here's__ a test __sentence__")
["Here's", 'sentence']
or you can put tags around the word like this
>>> print re.sub(r"\b(__)(\S+)(__)\b",r"<b>\2<\\b>","__Here__ is a test __sentence__")
<b>Here<\b> is a test <b>sentence<\b>
If you need more fine grained control over the legal word characters it's best to be explicit
\b__([a-zA-Z0-9_':])__\b ### count "'" and ":" as part of words
>>> re.findall(r"\b__([a-zA-Z0-9_']+)__\b","__Here's__ a test __sentence:__")
["Here's"]
>>> re.findall(r"\b__([a-zA-Z0-9_':]+)__\b","__Here's__ a test __sentence:__")
["Here's", 'sentence:']
gnibbler
2010-02-16 04:37:34
`\S` will match any non space character (including symbols), so `.__+__.` will be matched.
Amarghosh
2010-02-16 04:43:03
@Amarghosh, the OP didn't specify what "word" means, so I interpreted it as a string of non whitespace characters. Of course you could use `\w` instead of `\S`, but then words like "__Here's__" would be broken
gnibbler
2010-02-16 04:46:48
A:
This will give you a list with all such words
>>> import re
>>> m = re.findall("(__\w+__)", "What __word__ you search __for__")
>>> print m
['__word__', '__for__']
kepkin
2010-02-16 04:40:19
A:
\b(__\w+__)\b
\b
word boundary
\w+
one or more word characters - [a-zA-Z0-9_]
Amarghosh
2010-02-16 04:40:36
A:
simple string functions. no regex
>>> mystring="blah __word__ blah __word2__"
>>> for item in mystring.split():
... if item.startswith("__") and item.endswith("__"):
... print item
...
__word__
__word2__
ghostdog74
2010-02-16 04:41:12