tags:

views:

57

answers:

2

I have a string called 'raw'. I am trying to parse it in ruby in the following way:

raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(-+|\342\200\224)[ ]*\d*\.?\d+/

The output from the above is []. I think it should be: ["8.0—10.0"].

Does anyone have any insight into what is wrong with the above regex statement?

Note: \342\200\224 is equal to (em-dash, U+2014).

The piece that is not working is: (-+|\342\200\224)

I think it should be equivalent to saying, match on 1 or more - OR match on the string \342\200\224.

Any help would be greatly appreciated it!

A: 
raw = "HbA1C ranging 8.0—10.0%"
raw.scan(/\d+\.\d+.+\d+\.\d+/)
#=> ["8.0\342\200\22410.0"]
fl00r
+1  A: 

The original regex works for me (ruby 1.8.7), justs needs the capture to be non-capturing and scan will output the entire match. Or switch to String#[] or String#match instead of String#scan and don't edit the regex.

raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(?:-+|\342\200\224)[ ]*\d*\.?\d+/
# => ["8.0—10.0"]

For testing/building regular expressions in Ruby there's a fantastic tool over at http://rubular.com that makes it a lot easier. http://rubular.com/r/b1318BBimb is the edited regex with a few test cases to make sure it works against them.

Caius