views:

859

answers:

5

Wanted

A command line HTML5 beautifier running under Linux.

Input

Garbled, ugly HTML5 code. Possibly the result of multiple templates. You don't love it, it doesn't love you.

Output

Pure beauty. The code is nicely indented, has enough line breaks, cares for it's whitespace. Rather than viewing it in a webbrowser, you would like to display the code on your website directly.

Suspects

  • tidy does too much (heck, it alters my doctype!), and it doesn't work well with HTML5. Maybe there is a way to make it cooperate and not alter anything?
  • vim does too little. It only indents. I want the program to add and remove line breaks, and to play with the whitespace inside of tags.

DEAD OR ALIVE!

A: 

IMHO fot the html code the "Aptana" is so good. And the "Komodo editor" ;-)

joanballester
That's no command line program, either.
blinry
A: 

Notepad++ has a HTML tidying facility that is pretty good.

Jonno_FTW
That's no command line program.
blinry
A: 

You can use this script for vim: http://vim.wikia.com/wiki/Better_indent_support_for_php_with_html

tip via: http://stackoverflow.com/questions/459478/correct-indentation-of-html-and-php-using-vim

ideotop
I wrote in the question that vim does to little for me. It *only* indents.
blinry
+2  A: 

If you use Haml as your nanoc-filter, your html will automatically be pretty-printed. You can set html5 output as an option.

Dan Brendstrup
+2  A: 

I suspect tidy can be made to work with the right command-line parameters.

http://tidy.sourceforge.net/docs/quickref.html

You can specify an arbitrary doctype and add new block, inline, and empty tags, and turn on and off lots of tidy's cleaning options.

Depending on what you want it to "beautify" you can probably get decent results. It probably won't be able to do some of the more advanced things like rewriting the html content to eliminate spurious elements or combining them, if it doesn't recognize them.

Mr. Shiny and New
At a rough guess, how about `tidy -as-xhtml --input-xml --tidy-mark no -indent --indent-spaces 4 -wrap 0 --new-blocklevel-tags article,header,footer --new-inline-tags video,audio,canvas,ruby,rt,rp --doctype "<!DOCTYPE HTML>" --break-before-br yes --sort-attributes alpha --vertical-space yes ` (disclaimer - I've not used html5, and I've only copied a few new tags from http://www.w3schools.com/html5/html5_reference.asp into the list by guessing which were block/inline, so please adjust as appropriate.)
Stobor
This seems to be the best option. Kudos to Stobor, too!
blinry