tags:

views:

1081

answers:

4
+2  Q: 

ruby regex .scan

I'm using Ruby's scan() method to find text in a particular format. I then output it into a string separated by commas. The text I'm trying to find would look like this:

AB_ABCD_123456

Here's the what I've come up with so far to find the above. It works fine:

text.scan(/.._...._[0-9][0-9][0-9][0-9][0-9][0-9]/)
puts text.uniq.sort.join(', ')

Now I need a regex that will find the above with or without a two-letter country designation at the end. For example, I would like to be able to find all three of the below:

AB_ABCD_123456
AB_ABCD_123456UK
AB_ABCD_123456DE

I know I could use two or three different scans to achieve my result, but I'm wondering if there's a way to get all three with one regex.

+3  A: 
/.._...._[0-9][0-9][0-9][0-9][0-9][0-9](?:[A-Z][A-Z])?/

You can also use {} to make the regex shorter:

/.{2}_.{4}_[0-9]{6}(?:[A-Z]{2})?/

Explanation: ? makes the preceding pattern optional. groups expressions together (so ruby know the ? applies to the two letters). The ?: after the opening ( makes the group non-capturing (capturing groups would change the values yielded by scan).

sepp2k
works well, thank you, and the shortcuts will help.
michaelmichael
+1  A: 
 /.._...._\d{6}([A-Z]{2})?/
Avdi
If you don't make the group non-capturing scan will only yield the country-codes (or nil for the strings that didn't include one), not the entire string that was matched.
sepp2k
+1  A: 

Why not just use split?

"AB_ABCD_123456".split(/_/).join(',')

Handles the cases you listed without modification.

ezpz
AFAIK, the OP is trying to find a list of these codes ... not work with just one.
The Wicked Flea
Yes; I saw the example and jumped past the details - a terrible habit. Sorry for the confusion.
ezpz
A: 

Try this:

text.scan(/\w{2}_\w{4}_\d{6}\w{0,2}/) 
#matches AB_ABCD_123456UK or ab_abcd_123456uk and so on...

or

text.scan(/[A-Z]{2}_[A-Z]{4}_\d{6}[A-Z]{0,2}/) 
# tighter, matches only AB_ABCD_123456UK and similars...
# and not something like ab_aBCd_123456UK or ab_abcd_123456uk and similars...

refer to these urls:

http://stackoverflow.com/questions/1234741/ruby-gsub-modifiers

http://ruby-doc.org/docs/ruby-doc-bundle/Manual/man-1.4/syntax.html#regexp

if you want to learn more about regex.

vulcan_hacker
i like that second regex example. thanks for the links. i've gone through them, though not as thoroughly as i should. real life problems help my understanding a lot.
michaelmichael