tags:

views:

87

answers:

4

I have a string "one two 9three 52eight four", so I only want to get "one two four", because "three" starts with "9" and "eight" starts with "52".

I tried:

"(?!\d)\w+"

but it's still taking the "three" and "eight". I don't want it.

+1  A: 

Works fine for me:

import re

l = "one two 9three 52eight four".split()
c = re.compile("(?!\d)\w+")

m = [w for w in l if re.match(c, w)]
print m

Prints:

['one', 'two', 'four']
miles82
Strange, check this out http://tinyurl.com/2ctzevm
pocoa
@pocoa, because he splitted into words first, and check one word by word with `re.match` which need to match at the start of the string. thats why 9three, and 52eight is not matched.
S.Mark
@S.Mark I don't have a Python installed on this public computer. So I'm relying on online tools. Sorry @miles82!
pocoa
+2  A: 

that's because \w includes number. what you need to do is:

>>> s = "one two 9three 52eight four"
>>> import re
>>> re.findall(r'\b[a-z]+\b', s, re.I)
['one', 'two', 'four']

Also, what you're using (?!...) is called negative look-ahead, while you probably meant negative look-behind (?<!...), which would of course still fail because of above-mentioned issue.

eta: then you just need a single word border:

>>> re.findall(r'\b(?!\d)\w+', s)
['one', 'two', 'four']
SilentGhost
Thanks. Sorry, I didn't provide enough information. I don't want to match if it stars with the number but "four8" is okay.
pocoa
Thanks, second example is working too.
pocoa
+3  A: 

Try

\b[a-zA-Z]\w*
S.Mark
Thanks. This one is working.
pocoa
this is definitely the right answer +1 includes both lowercase and uppercase chars
c0mrade
@c0mrade: and which answer doesn't?
SilentGhost
Note: `\w` include underscore, if underscore is not need, `[a-zA-Z0-9]*` should be used instead of `\w*`
S.Mark
SilentGhost's answer also does for both cases, there is `re.I` (ignore case) flag.
S.Mark
Thanks for all comments. S.Mark was the first so I marked as the answer.
pocoa
@SilentGhost yours, maybe u do it with re.I I don't know that part I just read regex not language specific, ah I see now tagged python .. sorry my bad ..
c0mrade
A: 

regexp might be overkill.

In [3]: [word for word in eg.split(' ') if not word[0].isdigit()]
Out[3]: ['one', 'two', 'four']
Reagle