tags:

views:

36

answers:

1

How if at all can I use regex to match a string with a variable number of matches.

The strings I want to parse look like:

'Every 15th of the month'
'Every 21st and 28th of the month'
'Every 21st, 22nd and 28th of the month'

ad infinitum...

I want to be able to capture the ordinal numbers (15th, 21st etc)

The language I'm using is Ruby for what it's worth.

Thanks, Alex

+2  A: 

You can capture them into an array with scan, which will match all occurrences of your regex:

irb(main):001:0> s = 'every 15th of the month'
=> "every 15th of the month"
irb(main):003:0> s2 = 'every 21st and 28th of the month'
=> "every 21st and 28th of the month"
irb(main):004:0> s3 = 'every 21st, 22nd, and 28th of the month'
=> "every 21st, 22nd, and 28th of the month"
irb(main):006:0> myarray = s3.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["21st"], ["22nd"], ["28th"]]
irb(main):007:0> myarray = s2.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["21st"], ["28th"]]
irb(main):008:0> myarray = s.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["15th"]]
irb(main):009:0>

Then of course you can access each match using the typical myarray[index] notation (or loop through all of them, etc).

Edit: Based on your comments, this is how I would do this:

ORDINALS = (1..31).map { |n| ActiveSupport::Inflector::ordinalize n } 
DAY_OF_MONTH_REGEX = /(#{ORDINALS.join('|')})/i
myarray = string.scan(DAY_OF_MONTH_REGEX)

This really only gets tripped up by ordinal numbers that might appear in other phrases. Trying to get more restrictive than that will probably be pretty ugly, since you have to cover a bunch of different cases. Might be able to come up with something...but it probably wouldn't be worth it. If you want to parse the string with really fine-grained control and a variable amount of text to match, then this probably just isn't a job for regex, to be honest. It's hard to be certain without knowing what format the lines are, if this is coming from a file with other similar lines, if you have any control over the format/contents of the strings, etc.

eldarerathis
I'm using this:
Alex Baranosky
ORDINALS = (1..31).map { |n| ActiveSupport::Inflector::ordinalize n } DAY_OF_MONTH_REGEX = /^\s*Every (#{ORDINALS.join('|')}) of the month/i
Alex Baranosky
Your idea is nice, but the only issue I have is that it makes the regex less strict (i.e. some bad strings can get through like 22st or 3th)
Alex Baranosky
ideally I'd want to ensure that only precisely worded strings could capture a match, and then collect an array of those valid matches. I have idea that might be an improvement:
Alex Baranosky
Then just use your regex... `myarray = str.scan(DAY_OF_MONTH_REGEX)` will match every instance of your pattern within the string and store any captures into `myarray`. Or are you looking for a better regex?
eldarerathis
DAY_OF_MONTH_REGEX = /^\s*Every (#{ORDINALS.join('|')}|and| |#{ORDINALS_W_COMMAS.join('|')}) of the month/i ... where ORDINALS_W_COMMAS is all the ordinals with a comma appended to them.
Alex Baranosky
@eldarerathis: if I use the original regex it won't match more than one day
Alex Baranosky
@Alex: It should match more than once with `scan`, since `scan` doesn't stop after the first match. It's effectively like Perl's `/g` (global) operator. How were you matching with your original one?
eldarerathis
@Alex: Oh, I see why it would only match once. In this case you're probably better off not anchoring. I would just make a regex for the part you actually want. In this case, probably something like `DAY_OF_MONTH_REGEX = /(#{ORDINALS.join('|')})/i.
eldarerathis
I ended up using: def self.can_parse?(s) s.matches?(/^Every /i) and s.matches?(/ of the month/i) end, to loosely determine if possible to parse this line. Then used....... ->
Alex Baranosky
Alex Baranosky