ansaurus

Question

python regex help: unknown information to skip

Answer 1

+1 A:

You need to use the | operator and findall:

>>> re.compile("(regex\d+|optregex\d+)")
>>> regex.findall(string)
[u'regex1', u'regex2', u'regex3', u'regex4', u'regex5', u'optregex1', u'optregex2', u'optregex3', u'optregex4']

An advice: there are several tools (GUIs) that allow you to experiment with (and actually help writing) regular expressions. For python, I'm quite fond of kodos.

Paolo Tedesco 2009-08-17 20:30:25

Thanks, I'll have to play with the findall command.... not sure it solves my problem, but I may find a solution faster that way.

2009-08-17 21:09:27

Answer 2

+4 A:

If all the data is in that format I'd go with split instead. I think it will be faster.


str = "regex1 regex2 regex3 unwanted-regex regex4 random number of tokens regex5 optregex1 optregex2 more unknown unwanted junk separated with white spaces optregex3 optregex4 etc"
parts = str.split() # now you have each part as an element of the array.
for index,item in enumerate(parts):
   if index == 3:
      continue # this is unwanted-regex
   else:
      # do what you want with the information here

Geo 2009-08-17 20:34:14

Yeah, that was my initial approach, but some of my fields include white spaces, that said I may just have to go that route.Thanks

2009-08-17 21:03:26

you can use string's `join` to merge some of your data.

Geo 2009-08-17 21:15:19

+1 This is not a regex problem.

hughdbrown 2009-08-17 21:26:45

Answer 3

A:

If all of your targets consist of things like "foo1", "bar22" etc (in other words a sequence of letters followed by a sequence of digits) and everything else (sequences of digits, "words" without numeric suffixes, etc) is "junk" then the following seems to be sufficient:

re.findall(r'[A-Za-z]+\d+', targetstr)

(We can't use just r'\w+\d+' because \w matches digits and _ (underscores) as well as letters).

If you're looking for a limited number of key patterns, or some of the junk might match "foo123 ... then you'll obviously have to be more specific.

Jim Dennis 2009-08-18 00:39:12

ansaurus

tags:

views:

answers:

python regex help: unknown information to skip

related questions