tags:

views:

90

answers:

4

Hi guys,

I'm using RUBY 's regular expression to deal with text such as

${1:aaa|bbbb}
${233:aaa | bbbb | ccc  ccccc }
${34: aaa | bbbb | cccccccc     |d}
${343:   aaa   |   bbbb   |       cccccccc     |dddddd   ddddddddd}
${3443:a aa|bbbb|cccccccc|d}
${353:aa a| b b b b | c c c c c c c c      |        dddddd}

I want to get the trimed text between each pipe line. For example, for the first line of my upper example, I want to get the result aaa and bbbb, for the second line, I want aaa, bbbb and ccc ccccc. Now I have wrote a piece of regular expression and a piece of ruby code to test it:

array = "${33:aaa|bbbb|cccccccc}".scan(/\$\{\s*(\d+)\s*:(\s*[^\|]+\s*)(?:\|(\s*[^\|]+\s*))+\}/)
puts array

Now my problem is the (?:\|(\s*[^\|]+\s*))+ part can't create multiple groups. I don't know how to solve this problem, because the number of text I need in each line is variable. Anyone can help? Great thanks.

+1  A: 

Why don't you split your string?

str = "${233:aaa | bbbb | ccc  ccccc }"
str.split(/\d+|\$|\{|\}|:|\|/).select{|v| !v.empty? }.select{|v| !v.empty? }.map{|v| v.strip}.join(', ')
#=> "aaa, bbb, cc cccc"
fl00r
+1  A: 

Instead of trying to do everything at once, divide and conquer:

DATA.each do |line|
    line =~ /:(.+)\}/
    items = $1.strip.split( /\s* \| \s*/x )
    p items
end

__END__
${1:aaa|bbbb}
${233:aaa | bbbb | ccc  ccccc }
${34: aaa | bbbb | cccccccc     |d}
${343:   aaa   |   bbbb   |       cccccccc     |dddddd   ddddddddd}
${3443:a aa|bbbb|cccccccc|d}
${353:aa a| b b b b | c c c c c c c c      |        dddddd}

If you want to do it with a single regex, you can use scan, but this seems more difficult to grok:

DATA.each do |line|
    items = line.scan( /[:|] ([^|}]+) /x ).flatten.map { |i| i.strip }
    p items
end
FM
@FM, this is an interesting script. Can point me to where I an find more information about `DATA`, `__END__` and other Ruby sorcery like this?
macek
@smotchkkiss `DATA` and `__END__` reflect Ruby's Perl heritage. Here's a link: http://ruby-doc.org/docs/keywords/1.9/files/keywords_rb.html#M000003.
FM
+1  A: 

This might help you

Script

a = [
  '${1:aaa|bbbb}',
  '${233:aaa | bbbb | ccc  ccccc }',
  '${34: aaa | bbbb | cccccccc     |d}',
  '${343:   aaa   |   bbbb   |       cccccccc     |dddddd   ddddddddd}',
  '${3443:a aa|bbbb|cccccccc|d}',
  '${353:aa a| b b b b | c c c c c c c c      |        dddddd}'
]

a.each do |input|
  puts input
  input.scan(/[:|]([^|}]+)/).flatten.each do |s|
    puts s.gsub(/(^\s+|\s+$)/, '') # trim
  end
end

Output

${1:aaa|bbbb}
aaa
bbbb
${233:aaa | bbbb | ccc  ccccc }
aaa
bbbb
ccc  ccccc
${34: aaa | bbbb | cccccccc     |d}
aaa
bbbb
cccccccc
d
${343:   aaa   |   bbbb   |       cccccccc     |dddddd   ddddddddd}
aaa
bbbb
cccccccc
dddddd   ddddddddd
${3443:a aa|bbbb|cccccccc|d}
a aa
bbbb
cccccccc
d
${353:aa a| b b b b | c c c c c c c c      |        dddddd}
aa a
b b b b
c c c c c c c c
dddddd
macek
+1  A: 

When you repeat a capturing group in a regular expression, the capturing group only stores the text matched by its last iteration. If you need to capture multiple iterations, you'll need to use more than one regex. (.NET is the only exception to this. Its CaptureCollection provides the matches of all iterations of a capturing group.)

In your case, you could do a search-and-replace to replace ^\d+: with nothing. That strips off the number and colon at the start of your string. Then call split() using the regex \s*\|\s* to split the string into the elements delimited by vertical bars.

Jan Goyvaerts