tags:

views:

42

answers:

1

I have a list of words built from different HTML pages. Instead of writing rule after rule to strip out different elements, I am trying to go through the list and say if it's not a full word with only alpha characters, just move on. This is not working.

for w in words:
     if re.search('\b[a-zA-Z]\b', w) == None:
          continue

I am horrible with regular expressions (if you can't already tell!), so I could use some help. How would I write it so it checks each w to make sure it only have a-zA-Z in it?

+3  A: 

You're almost there. You just have to tell your search to match an entire string of 1 or more characters.

for w in words:
     if re.search('^[a-zA-Z]+$', w) == None:
          continue

Another solution (for this specific case atleast) would be to use isalpha();

for w in words:
    if not w.isalpha():
          continue
WoLpH
There's a typo in your second solution, but I agree that it's better suited for the presented problem. There's no need to use a regular expression here.
Andrew
@Andrew: thank you, I have fixed the typo.
WoLpH
Thank you. Both work wonderfully. I guess it's about time I pick up a python book. So many little things I need to learn.
Hallik