views:

39

answers:

3

I'm using ruby and I'm trying to find a way to grab text in between the {start_grab_entries} and {end_grab_entries} like so:

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

Something like so:

$1 => "i want to grab
       the text that
       you see here in
       the middle"

So far, I tried this as my regular expression:

\{start_grab_entries}(.|\n)*\{end_grab_entries}

However, using $1, that gives me a blank. Do you know what I can do to grab that block of text in between the tags correctly?

+3  A: 

There is a better way to allow the dot to match newlines (/m modifier):

regexp = /\{start_grab_entries\}(.*?)\{end_grab_entries\}/m

Also, make the * lazy by appending a ?, or you might match too much if more than one such section occurs in your input.

That said, the reason why you got a blank match is that you repeated the capturing group itself; therefore you only caught the last repetition (in this case, a \n).

It would have "worked" if you had put the capturing group outside of the repetition:

\{start_grab_entries\}((?:.|\n)*)\{end_grab_entries\}`

but, as said above, there is a better way to do that.

Tim Pietzcker
Extremely appreciate the great response, Tim. That was great. Thank you so much! :)
sjsc
+1  A: 
string=<<EOF
blah
{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}
blah
EOF

puts string.scan(/{start_grab_entries}(.*?){end_grab_entries}/m)
Thank you so much :) Really appreciate it!
sjsc
A: 

I'm adding this because often we're reading data from a file or data-stream where the range of lines we want are not all in memory at once. "Slurping" a file is discouraged if the data could exceed the available memory, something that easily happens in production corporate environments. This is how we'd grab lines between some boundary markers as the file is being scanned. It doesn't rely on regex, instead using Ruby's "flip-flop" .. operator:

#!/usr/bin/ruby

lines = []
DATA.each_line do |line|
  lines << line if (line['{start_grab_entries}'] .. line['{end_grab_entries}'])
end

puts lines          # << lines with boundary markers
puts
puts lines[1 .. -2] # << lines without boundary markers

__END__
this is not captured

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

this is not captured either

Output of this code would look like:

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

i want to grab
the text that
you see here in
the middle
Greg