ansaurus

Question

Answer 1

+3 A:

Is there a quick way in ruby to strip the (smallest amount of) text between two markers?

str = "lala BEGIN_MARKER \nlu\nlu\n END_MARKER foo BEGIN_MARKER bar END_MARKER baz"
str.gsub(/BEGIN_MARKER.*?END_MARKER/m, "")
#=> "lala  foo  baz"

sepp2k 2010-02-24 17:59:11

D'oh! *thumps head* of course - thanks!

JP 2010-02-24 18:25:47

Answer 2

+1 A:

gsub could be expensive for longer files (if you're reading in the whole file as string)

so if you have to chunk it anyway, you might want to use a stateful parser

in_block = false
File.open(fname).each_line do |line| 
 if in_block
    in_block = false if line =~ /END_MARKER/
    next
  else
    in_block = true if line =~ /BEGIN_MARKER/
    next
  end
  count_words(line)
end

klochner 2010-02-24 18:07:31

Answer 3

A:

You should look at str.scan(). Assuming your text is in the variable s, something like this should work:

s_strip_inset = s.sub!(/\\begin_inset.*?\\end_inset/, "")
word_count = s_strip_inset.scan(/(\w|-)+/).size

ghoppe 2010-02-24 18:13:46

ansaurus

tags:

views:

answers:

Text manipulation in Ruby

related questions