tags:

views:

630

answers:

2

I have a text file that contains a list of regexp's which I regularly use to clean html files according:

list.txt

<p[^>]*>|<p>
<\/?(font|span)[^>]*>|
<\/u>\s*<u>|
<\/u>\s*<i>\s*<u>|<i>

if each line consisted of the form "#{a}|#{b}", what would be the simplest way to both read and convert this file into the array:

[
  [ /<p[^>]*>/, '<p>' ],
  [ /<\/?(font|span)[^>]*>/, '' ],
  [ /<\/u>\s*<u>/, '' ],
  [ /<\/u>\s*<i>\s*<u>/, '<i>' ]
]
A: 

Assuming that the #{b} part will never contain a |, I get the following:

File.open(filename,"r").collect
  { |s|
    x = s.rindex('|');
    [ Regexp.new(s[0..x]), s[x+1..-1].chop ]
  }

Otherwise, you'll probably have to replace s.rindex('|') with something more complicated.

mweerden
+2  A: 

Try the following:

result = File.foreach("list.txt").collect do |line|
  *search, replace = line.strip.split("|", -1)
  [Regexp.new(search.join("|")), replace]
end

Or if your separator does not occur in the regexes and replacements:

result = File.foreach("list.txt").collect do |line|
  search, replace = line.strip.split("!", -1)
  [Regexp.new(search), replace]
end
molf