views:

762

answers:

3

Does anybody know how I can convert html to plain text with Ruby. Well really I need to convert RedCloth to plain text, either way would be fine.

I'm not talking about just striping out the tags (that is all I've done so far). For example I would like an ordered list to retain the numbers, unordered lists to use an astrix for bullets etc.

 def red_cloth_to_plain_text(s)
       s = RedCloth.new(s).to_html
       s = strip_tags(s)
       s = html_unescape(s) # reverse of html_escape
       s = undo_red_cloths_html_codes(s)
       return s 
 end

Any help would be much appreciated. Maybe I have to attempt a RedCloth to plain text formatter :s

A: 

That may be what you have to do. You're not the first to want this, but I'm guessing it's not part of the library yet because everyone wants their plaintext a little different.

rampion
+2  A: 

You need to make a new formatter class.

module RedCloth::Formatters
  module PlainText
    include RedCloth::Formatters::Base
    # ...
  end
end

I won't write your code for you today but this is very easy to do. Read the RedCloth source if you doubt me: it's only 346 lines for the HTML formatter.

So, once you have your PlainText formatter you patch the class and use it:

module RedCloth
  class TextileDoc
    def to_txt( *rules )
      apply_rules(rules)
      to(RedCloth::Formatters::PlainText)
    end
  end
end

print RedCloth.new(str).to_txt
The Wicked Flea
A: 

Joseph Halter wrote a RedCloth plain formatter:

http://github.com/JosephHalter/redcloth-formatters-plain

Example usage:

RedCloth.new("p. this is *simple* _test_").to_plain

will return:

"this is simple test"
Josiah I.