tags:

views:

626

answers:

4

I'm currently using the RubyTidy Ruby bindings for HTML tidy to make sure HTML I receive is well-formed. Currently this library is the only thing holding me back from getting a Rails application on Ruby 1.9. Are there any alternative libraries out there that will tidy up chunks of HTML on Ruby 1.9?

A: 

I have written a beautifier for HTML. It is written in JavaScript, so you would have to instruct your Ruby to interpret the JavaScript or execute something else that can interpret it.

My beautifier is completely vocabulary independent, so it will work with any language that looks like XML. Since it is vocabulary independent it has no idea if certain elements do not have a closing pair, so singleton elements like br and input must always contain a closing forward slash in the tag. If this does not occur my code will think your singleton tags are starts tags missing a closing pair, which produces flaws in indentation.

Give it a try and see if this can work for you. For more information read the documentation. http://mailmarkup.org/prettydiff/prettydiff.html

I'd prefer a pure Ruby (or C) implementation
Christian
A: 

Currently this library is the only thing holding me back from getting a Rails application on Ruby 1.9.

Watch out, the Ruby Tidy bindings have some nasty memory leaks. It's currently unusable in long running processes. (for the record, I'm using http://github.com/ak47/tidy)

I just had to remove it from a production Rails 2.3 application because it was leaking about 1MB/min.

Xavier
A: 

Here is a nice example of how to make your html look better using tidy:

require 'tidy'
Tidy.path = '/opt/local/lib/libtidy.dylib' # or where ever your tidylib resides

nice_html = ""
Tidy.open(:show_warnings=>true) do |tidy|
  tidy.options.output_xhtml = true
  tidy.options.wrap = 0
  tidy.options.indent = 'auto'
  tidy.options.indent_attributes = false
  tidy.options.indent_spaces = 4
  tidy.options.vertical_space = false
  tidy.options.char_encoding = 'utf8'
  nice_html = tidy.clean(my_nasty_html_string)
end

# remove excess newlines
nice_html = nice_html.strip.gsub(/\n+/, "\n")
puts nice_html

For more tidy options, check out the man page.

thomax
A: 

http://github.com/libc/tidy_ffi/blob/master/README.rdoc works with ruby 1.9 (latest version)

If you are working on windows, you need to set the library_path eg

    require 'tidy_ffi'
    TidyFFI.library_path = 'lib\\tidy\\bin\\tidy.dll'
    tidy = TidyFFI::Tidy.new('test')
    puts tidy.clean

(It uses the same dll as tidy) The above links gives you more example of the usage.

surajz