ansaurus

Question

Capture variable amount of matches with regular expressions?

Answer 1

+2 A:

You can capture them into an array with scan, which will match all occurrences of your regex:

irb(main):001:0> s = 'every 15th of the month'
=> "every 15th of the month"
irb(main):003:0> s2 = 'every 21st and 28th of the month'
=> "every 21st and 28th of the month"
irb(main):004:0> s3 = 'every 21st, 22nd, and 28th of the month'
=> "every 21st, 22nd, and 28th of the month"
irb(main):006:0> myarray = s3.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["21st"], ["22nd"], ["28th"]]
irb(main):007:0> myarray = s2.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["21st"], ["28th"]]
irb(main):008:0> myarray = s.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["15th"]]
irb(main):009:0>

Then of course you can access each match using the typical myarray[index] notation (or loop through all of them, etc).

Edit: Based on your comments, this is how I would do this:

ORDINALS = (1..31).map { |n| ActiveSupport::Inflector::ordinalize n } 
DAY_OF_MONTH_REGEX = /(#{ORDINALS.join('|')})/i
myarray = string.scan(DAY_OF_MONTH_REGEX)

This really only gets tripped up by ordinal numbers that might appear in other phrases. Trying to get more restrictive than that will probably be pretty ugly, since you have to cover a bunch of different cases. Might be able to come up with something...but it probably wouldn't be worth it. If you want to parse the string with really fine-grained control and a variable amount of text to match, then this probably just isn't a job for regex, to be honest. It's hard to be certain without knowing what format the lines are, if this is coming from a file with other similar lines, if you have any control over the format/contents of the strings, etc.

eldarerathis 2010-10-08 03:29:21

I'm using this:

Alex Baranosky 2010-10-08 03:55:43

ORDINALS = (1..31).map { |n| ActiveSupport::Inflector::ordinalize n } DAY_OF_MONTH_REGEX = /^\s*Every (#{ORDINALS.join('|')}) of the month/i

Alex Baranosky 2010-10-08 03:56:30

Your idea is nice, but the only issue I have is that it makes the regex less strict (i.e. some bad strings can get through like 22st or 3th)

Alex Baranosky 2010-10-08 03:57:38

ideally I'd want to ensure that only precisely worded strings could capture a match, and then collect an array of those valid matches. I have idea that might be an improvement:

Alex Baranosky 2010-10-08 04:04:23

Then just use your regex... `myarray = str.scan(DAY_OF_MONTH_REGEX)` will match every instance of your pattern within the string and store any captures into `myarray`. Or are you looking for a better regex?

eldarerathis 2010-10-08 04:05:50

DAY_OF_MONTH_REGEX = /^\s*Every (#{ORDINALS.join('|')}|and| |#{ORDINALS_W_COMMAS.join('|')}) of the month/i ... where ORDINALS_W_COMMAS is all the ordinals with a comma appended to them.

Alex Baranosky 2010-10-08 04:05:50

@eldarerathis: if I use the original regex it won't match more than one day

Alex Baranosky 2010-10-08 04:08:17

@Alex: It should match more than once with `scan`, since `scan` doesn't stop after the first match. It's effectively like Perl's `/g` (global) operator. How were you matching with your original one?

eldarerathis 2010-10-08 04:11:19

@Alex: Oh, I see why it would only match once. In this case you're probably better off not anchoring. I would just make a regex for the part you actually want. In this case, probably something like `DAY_OF_MONTH_REGEX = /(#{ORDINALS.join('|')})/i.

eldarerathis 2010-10-08 04:15:10

I ended up using: def self.can_parse?(s) s.matches?(/^Every /i) and s.matches?(/ of the month/i) end, to loosely determine if possible to parse this line. Then used....... ->

Alex Baranosky 2010-10-08 04:27:33

Alex Baranosky 2010-10-08 04:28:00

ansaurus

tags:

views:

answers:

Capture variable amount of matches with regular expressions?

related questions