tags:

views:

47

answers:

3

so i'm dealing with a text (in string) containing many year numbers (four-digit), i'm trying to divide the text into segments each of which begins and ends with a year number (doesn't matter if the year number is included in the segment). basically year numbers just act like a signal for the code to 'cut'.

any ideas how i can do that? how to identify a four-digit num

thanks a million!

+2  A: 
>> 'ab2010cd'.scan(/\D(\d{4})\D/)   # 4 digit numbers match
=> [["2010"]]
>> 'ab201cd'.scan(/\D(\d{4})\D/)    # <4 digit numbers don't match
=> []
>> 'ab20101cd'.scan(/\D(\d{4})\D/)  # >4 digit numbers don't match
=> []
>>

in ruby1.9 you can use lookahead/lookbehind assertions to do a split

>> 'ab2010cd'.split(/(?<=\D)(\d{4})(?=\D)/)
=> ["ab", "2010", "cd"]
gnibbler
Very artistic regex :)
Skilldrick
thanks~ it works welldo u know how i can specify the search to four-digit that only in the forms of 19XX or 20XX?
es9999
+1  A: 
ruby-1.9.2-preview1 > "abc1234tgnh".match(/\d{4}/)
 => #<MatchData "1234"> 
Jed Schneider
A: 

Given the string

s = 'abcd 1234 efghijk 56789 nope 0987 blah blah 2010 hmmm'

Should there be 2 or 3 matches (given the "2010 hmmm" substring does not end with a year)? I'm going to assume you want to match that (if not, remove the |\Z from the regex).

s.scan(/\b\d{4}\b.+?(?=\b\d{4}\b|\Z)/)
# => ["1234 efghijk 56789 nope ", "0987 blah blah ", "2010 hmmm"]

But, as you say you don't care about keeping the numbers:

s.scan(/(?<=\d{4}).+?(?=\b\d{4}\b|\Z)/)
# => [" efghijk 56789 nope ", " blah blah ", " hmmm"]
glenn jackman
thanks~ it works well do u know how i can specify the search to four-digit that only in the forms of 19XX or 20XX?
es9999
@es9999, simple enough. just replace `\d{4}` with `(?:19|20)\d\d`
glenn jackman