tags:

views:

63

answers:

4

I tested this regex out in RegexBuddy

,[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?

and it seems to be able to do what I need it to do - capture a piece of data that looks like one of the following:

,POWDER,RO,ML,8/19/2002
,POWDER,RO,,,
,POWDER,RO,,8/19/2002
,POWDER,RO,ML,,

When I use it in a python string:

r",[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"

It misses the first part of the match, and my resulting matches look like: RO,ML,8/19/2002, or RO,ML, or jusr RO,

The first token is a word that is stored as all caps and may have spaces (and/or possibly punctuation that i need to address as well shortly) in it. if I remove the space it still doesn't capture the one word names that it should. Did I miss something obvious?

+3  A: 

The first part of your regex doesn't have capturing parentheses around it. Try the regex:

,([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?
 #^^ This was [A-Z\s]+?; needs to be ([A-Z\s]+?)

which would be this in python:

r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"

Example from the interpreter:

>>> import re
>>> r = re.compile(r",[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?")
>>> r.match(",POWDER,RO,ML,8/19/2002").groups()
('RO', 'ML', '8/19/2002')
>>> r = re.compile(r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?")
>>> r.match(",POWDER,RO,ML,8/19/2002").groups()
('POWDER', 'RO', 'ML', '8/19/2002')
eldarerathis
A: 

I'm not into python, but you just forgot to use brackets to indicate that you want to capture that part:

,([A-Z\s]+)?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})? should do what you want

balu
+6  A: 

Yes. You did not capture the first group.

r",([A-Z\s]+),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
#  ^        ^ 

BTW, it seems that you are parsing a CSV file with regex. In Python, there is already a csv module.

KennyTM
Thx, i'll check out the csv module to see if i can leverage it for what I need to do. Sadly this is just using python to prove to myself that my script will work, I'll need to actually implement it in Java or Groovy so no one at work freaks out.
jonny
A: 

Yes, you missed the grouping parentheses:

>>> s = ",POWDER,RO,ML,8/19/2002"
>>> pat = r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
>>> re.match(pat, s).groups()
('POWDER', 'RO', 'ML', '8/19/2002')
Lie Ryan