views:

858

answers:

3

I need to match an infinite number of figures in a web page.

I need to be able to match all of the following formats:

100 $
99$
$99
$ 8
$.99
$ .8
$ 99.8
.99$
.99 $
9.2 $
1.2$

And the equivalent using commas:

444,333
22,333
1,222
11,111,111
333,333,333,333.01132

Or spaces:

444 333
22 333
1 222
11 111 111
333 333 333 333.01132

This is a really hard one for me. I am used to playing with regexp but I have totally failed to write something bullet proof. Usually http://www.regexlib.com has the solution, but not for this one.

I can't think of another way other than using regexp since it's a plain text search/replace.

+5  A: 

Why write 1 regexp, when you can write several, and apply them in turn?

I'm assuming (?) that you can iterate through line-by-line. Why not try your comma-savvy regexp, followed by your space-savvy regexp etc.? If one matches, then don't bother trying the rest, and store your result and move on to the next line.

Brian Agnew
This would certainly be MUCH more readable than any single regex. +1/
Triptych
+1 but not handy since parsing textnode in JS is alreay taking thousands of function call to lambda, with hundred closures...
e-satis
A: 

what about doing this in 2 steps:

first replace all spaces with ''

then, if number formatting is always the same, you can replace commas with ''

after that, its pretty easy, no?

mkoryak
+3  A: 

Here's a regular expression that will match all the number formats you've provided:

^(?:\$\s*)?(?:(?:\d{0,3}(?:[, ]\d{0,3})*[, ])+\d{3}|\d+)(?:\.\d*)?(?:\s*\$)?$

To break it down:

  • ^(?:\$\s*)? will look for an optional $ at the start of the string, followed by any amount of spaces
  • (?:(?:\d{0,3}(?:[, ]\d{0,3])*[, ])+\d{3}|\d*) will match either a number broken down into groups separated by a comma or space (\d{0,3}(?:[, ]\d{0,3})*[, ])+\d{3}) or a string of numbers (\d+) -- so 123,456,789, 123 456 789 and 123456789 would be all matched. The regular expression will not accept numbers with improper grouping (so 123,45,6789 will not be matched)
  • (?:\.\d*)? will match a number with an optional decimal and any number of numbers after
  • (?:\s*\$)?$ will match an optional $ at the end of the string, preceded by any amount of space.
Daniel Vandersluis
+1 for pure regex geek cool!! :)
Shadi Almosri
Yes, but it will also match the empty string, as well as any number of digits without the dollar sign.
Nikolai Ruhe
... and even $0$
Nikolai Ruhe
@Nikolai oops, that should have been \d+, not \d*, I'll fix that. And I didn't see from the requirements that the dollar sign was required?
Daniel Vandersluis
No, the requirements are unclear. But if it's only about matching all examples, .* will do.
Nikolai Ruhe
Nice answer, thanks.
e-satis