tags:

views:

83

answers:

4

Hi, so i'm just making a script to collect $ values from a transaction log type file

for line in sys.stdin:
    match = re.match( r'\b \$ (\d+) \b', line)
    if match is not None:
            for value in match.groups():
                    print value

right now I'm just trying to print those values it would match a line containing $12323 but not when there are other things in the line From what I read it should work, but looks like I could be missing something

+3  A: 

By having a space between \$ and (\d+), the regex expects a space in your string between them. Is there such a space?

Eli Bendersky
spaces are only ignored in multiline string?
Augie
@lanzelloth: spaces are only ignored if you use the `re.VERBOSE` (or `re.X`) modifier. Multiline has nothing to do with it.
Alan Moore
+6  A: 

re.match:

If zero or more characters at the beginning of string match this regular expression, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

What your are looking for is either re.search or re.findall:

#!/usr/bin/env python

import re
s = 'aug 12, 2010 abc $123'

print re.findall(r'\$(\d+)', s)
# => ['123']

print re.search(r'\$(\d+)', s).group()
# => $123

print re.search(r'\$(\d+)', s).group(1)
# => 123
The MYYN
or re.finditer :)
bronzebeard
A: 

Others have already pointed out some shortcomings of your regex (especially the mandatory spaces and re.match vs. re.search).

There is another thing, though: \b word anchors match between alphanumeric and non-alphanumeric characters. In other words, \b \$ will fail (even when doing a search instead of a match operation) unless the string has some alphanumeric characters before the space.

Example (admittedly contrived) to work with your regex:

>>> import re
>>> test = [" $1 ", "a $1 amount", "exactly $1 - no less"]
>>> for string in test:
...     print(re.search(r"\b \$\d+ \b", string))
...
None
<_sre.SRE_Match object at 0x0000000001DD4370>
None
Tim Pietzcker
+1  A: 

I am not so clear what is accepted for you but from statement

a line containing $12323 but not when there are other things in the line

I would get that

'aug 12, 2010 abc $123'

Is not supposed to match as it has other text befor the amount.

If you want to match amount at end of the line here is the customary anti-regexp answer (even I am not against of using them in easy cases):

loglines = ['aug 12, 2010 abc $123', " $1 ", "a $1 amount", "exactly $1 - no less"]

# match $amount at end of line without other text after
for line in loglines:
    if '$' in line:
        _,_, amount = line.rpartition('$')
        try:
            amount = float(amount)
        except:
            pass
        else:
            print "$%.2f" % amount
Tony Veijalainen