tags:

views:

181

answers:

4

There are multiple posts on here that capture value, but I'm just looking to check to see if the value is something. More vaguely put; I'm looking to understand the difference between checking a value, and "capturing" a value. In the current case the value would be the following acceptable money formats:

Here is a post that explains some about a money regex but I don't understand it a bit.

.50
50
50.00
50.0
$5000.00
$.50

I don't want commas (people should know that's ridiculous).

The thing I'm having trouble with are:

  1. Allowing for a $ at the starting of the value (but still optional)
  2. Allowing for only 1 decimal point (but not allowing it at the end)
  3. Understanding how it's working inside
  4. Also understanding out to get a normalized version (only digits and a the optional decimal point) out of it that strips the dollar sign.

My current regex (which obviously doesn't work right) is:

# I'm checking the Boolean of the following:
re.compile(r'^[\$][\d\.]$').search(value)

(Note: I'm working in Python)

+3  A: 

I believe the following regex will meet your needs:

/^\$?(\d*(\.\d\d?)?|\d+)$/

It allows for an optional '$'. It allows for an optional decimal, but requires at least one but not more than two digits after the decimal if the decimal is present.

Edit: The outer parentheses will catch the whole numeric value for you.

Aaron
All matches are returned in a list. `.group(1)` returns the entire matched string, then the subsequent groups return matched groups within. `.group(3)` should return just the decimal and digits after the decimal if those are present. For your purposes, `.group(2)` should always give you the entire number.
Aaron
Your pattern matches both the empty string and a solitary dollar sign. With regular expressions, remember that the `*` and `?` quantifiers *always* succeed.
Greg Bacon
gbacon: +1 Good suggestion for extra testcases... I added these to my tests too.
Mark Byers
@gbacon: You are absolutely right. I have corrected it. Thanks!
Aaron
@Aaron: It still seems to match the empty string.
Mark Byers
@Mark Byers: Could you explain how it still matches the empty string? I'm not seeing it, but I could easily be missing something.
Aaron
@Aaron: Well if you want to verify it, you could test it using gbacon's or my testbed. The reason is that your regex offers two choices of how to match: either `/^\$?(\d*(\.\d\d?)?)$/` or `/^\$?(\d+)$/`. The second of these doesn't match the empty string, but the first does as all elements are optional. To prevent the empty string matching, at least one of the elements in the first half of your (X|Y) also needs to be non-optional.
Mark Byers
+2  A: 

Also understanding out to get a normalized version (only digits and a the optional decimal point) out of it that strips the dollar sign.

This is also known as "capturing" the value ;)

Working off Aaron's base example:

/^\$?(\d+(?:\.\d{1,2})?)$/

Then the amount (without the dollar sign) will be in capture group 1.

Anon.
May be useful to note that `(?: )` is a non-capturing group.
Joel Potter
You caught me! I forgot to add in that last part.
Aaron
You can edit your post if you'd like. BTW, I edited my question because I failed to mention that I need to be able to accept $.50 and .50 as well.
orokusaki
In that case, replace the `\d+` with `\d*`.
Anon.
Changing `\d+` to `\d*` causes it to match the empty string and `$`. This is probably not what you want.
Mark Byers
With regex, it's often better to do 90% of it with a simple expression and manually check for corner-cases afterwards, than it is to try and craft one that catches *everything*.
Anon.
+4  A: 

Here's a regex you can use:

regex = re.compile(r'^\$?(\d*(\d\.?|\.\d{1,2}))$')

Here's a test-bed I used to test it. I've included all your tests, plus some of my own. I've also included some negative tests, as making sure that it doesn't match when it shouldn't is just as important as making sure that it does match when it should.

tests = [
    ('.50', True),
    ('50', True),
    ('50.00', True),
    ('50.0', True),
    ('$5000', True),
    ('$.50', True),
    ('$5.', True),
    ('$5.000', False),
    ('5000$', False),
    ('$5.00$', False),
    ('$-5.00', False),
    ('$5,00', False),
    ('', False),
    ('$', False),
    ('.', False),
]

import re
regex = re.compile(r'^\$?(\d*(\d\.?|\.\d{1,2}))$')
for test, expected in tests:
    result = regex.match(test) 
    is_match = result is not None
    print test + '\t' + ('OK' if is_match == expected else 'Fail')

To get the value without the $, you can use the captured group:

print result.group(1)
Mark Byers
thx 1+ on yours too.
orokusaki
I changed my comment to allow $5. instead of rejecting it, based on your comment to gbacon's answer.
Mark Byers
+4  A: 

Assuming you want to allow $5. but not 5., the following will accept your language:

money = re.compile('|'.join([
  r'^\$?(\d*\.\d{1,2})$',  # e.g., $.50, .50, $1.50, $.5, .5
  r'^\$?(\d+)$',           # e.g., $500, $5, 500, 5
  r'^\$(\d+\.?)$',         # e.g., $5.
]))

Important pieces to understand:

  • ^ and $ match only at the beginning and end of the input string, respectively.
  • \. matches a literal dot
  • \$ matches a literal dollar sign
    • \$? matches a dollar sign or nothing (i.e., an optional dollar sign)
  • \d matches any single digit (0-9)
    • \d* matches runs of zero or more digits
    • \d+ matches runs of one or more digits
    • \d{1,2} matches any single digit or a run of two digits

The parenthesized subpatterns are capture groups: all text in the input matched by the subexpression in a capture group will be available in matchobj.group(index). The dollar sign won't be captured because it's outside the parentheses.

Because Python doesn't support multiple capture groups with the same name (!!!) we must search through matchobj.groups() for the one that isn't None. This also means you have to be careful when modifying the pattern to use (?:...) for every group except the amount.

Tweaking Mark's nice test harness, we get

for test, expected in tests:
    result = money.match(test) 
    is_match = result is not None
    if is_match == expected:
      status = 'OK'
      if result:
        amt = [x for x in result.groups() if x is not None].pop()
        status += ' (%s)' % amt
    else:
      status = 'Fail'
    print test + '\t' + status

Output:

.50     OK (.50)
50      OK (50)
50.00   OK (50.00)
50.0    OK (50.0)
$5000   OK (5000)
$.50    OK (.50)
$5.     OK (5.)
5.      OK
$5.000  OK
5000$   OK
$5.00$  OK
$-5.00  OK
$5,00   OK
        OK
$       OK
.       OK
.5      OK (.5)
Greg Bacon
+1 for actually testing your solution.
Mark Byers
How do I make the $5. match (technically I can allow this value).
orokusaki
You did state in your question that you didn't want to allow a period at the end.
Anon.
Oh, oops. I didn't even realize that. Sorry.
orokusaki
@orokusaki See update.
Greg Bacon