views:

512

answers:

2

Hi there,

I'm looking for advice on how to clean submitted html in a web app so it can be redisplayed in future with out styles or unclosed tags wrecking the layout of an app.

On my app rich HTML is submitted by users with YUI Rich text editor, which by default runs a few regexps to clean the input, and I'm also calling the [filter_MSWord][1] to catch any crap sent in from office

On the back end, I'm running ruby-tidy to to sanitize the html before being displayed as comments, but on occasion badly pasted html still affect the layout of the app I'm using - how can I safeguard against this?

FWIW here are the sanitizer settings I'm using -

module HTMLSanitizer


  def tidy_html(input)

    cleaned_html = Tidy.open(:show_warnings=>false) do |tidy|
      # don’t output body and html tags
      tidy.options.show_body_only = true 
      # output xhtml
      tidy.options.output_html = true
      # don’t write newlines all over the place
      tidy.options.wrap = 0
      # use utf8 to play nice with rails
      tidy.options.char_encoding = 'utf8'
      xml = tidy.clean(input)
      xml
    end
  end

end

What else are my options here?

+1  A: 

I personally use the sanitize gem.

require 'sanitize'
op = Sanitize.clean("<html><body>wow!</body></hhhh>") # Notice the incorrect HTML. It still outputs "wow!"
Sinan Taifour
+1  A: 

I use the sanitize helper available from ActionView

Module ActionView::Helpers::SanitizeHelper

Ben