tidy

Tidy Converting <Span Style="Font-Style:Bold"> to <Class="C1">.

I am using the PHP 5 Tidy class to format html. Everything is fine except when it gets passed a style attribute, when it changes it into a class attribute. As I am only formatting the body of a document, not the head, there is no class defined in the head for the attribute to read. I have looked through all the Tidy options but can't ...

JTidy Node.findBody() — How to use?

Hello, I'm trying to do XHTML DOM parsing with JTidy, and it seems to be rather counterintuitive task. In particular, there's a method to parse HTML: Node Tidy.parse(Reader, Writer) And to get the <body /> of that Node, I assume, I should use Node Node.findBody(TagTable) Where should I get an instance of that TagTable? (Constructor...

How to convert all html escaped characters in a tidied xhtml string, so it loads in an XmlDocument?

In a .net web application I talk to a 3rd party CMS api which gives back html. I need to convert it to well formed xml, so I use an .NET wrapper around HTML tidy. This generates a nice DOM, but things go wrong when characters such as &nbsp; are used. I need those to be converted to their code format like &#160; in order for an XmlDocume...

What is the best way to parse HTML from a Rich Text Editor in Perl?

Is there a Perl module out there that can take bad HTML (such as what is copied from Microsoft Word) and parse it into nicely formatted HTML? I have looked at HTML::Tidy, but it has gotten horrible reviews on CPAN. We have a custom legacy module that's basically a wrapper for the command line version of tidy (which seems to be pretty m...

DOMDocument: Ignore Duplicate Element IDs

I'm putting some page content (which has been run through Tidy, but doesn't need to be if this is a source of problems) into DOMDocument using DOMDocument::loadHTML. It's coming up with various errors 'ID x already defined in Entity, line X'. Is there any way to make either DOMDocument (or Tidy) ignore or strip out duplicate element IDs,...

Where can I get php_tidy.dll for PHP 4.4.2 or 4.4.4?

Hi! I need to get tidy extension for PHP 4.4.2 and 4.4.4 (win32). I have tried to find dlls for these old PHP versions, but with no luck. ...

Delete files from disk that aren't in a Visual Studio project

Can anyone think of a way (perhaps using a PowerShell script or similar) where I can look for *.cs files that are on disk in the folder structure, but aren't included in a project file? This has come about gradually over time with merging in Subversion etc. I'm looking for a way to clean up after myself, basically. :) ...

xHTML markup checker integrated in Selenium

Hello Recently, I thought about how can I improve the quality of the projects, by using Continuous checking of xHTML source at Continuous Integration machine. Look, we have a project http://sourceforge.net/projects/jtidy - jTidy JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. It can validate the xHTML thr...

Using libtidy for iPhone app

I'm trying to use libtidy for an iPhone app (since the iPhone 2.2 SDK doesn't include NSXMLDocument which has tidy functionality) but I get a linker error saying "library not found for -ltidy" when I build the app. As for other framework/library references, I've added the libtidy.dylib to my list of referenced frameworks and I've added ...

Tool for cleaning up CSS?

Before publishing a site I have bloat of unused CSS styles. Is there any good tool to detect unused CSS classes, divs? Related Questions: Tool to identify unused css definitions Are there any utilites that will help me refactor CSS ...

Installing Html Tidy

I'm running Mac OS X with Apache/2.0.59 (Unix) PHP/5.2.5 DAV/2. I've never administered Apache or PHP before so somethings aren't really that obvious to me. I'm trying to get PHP Tidy to run as mentioned here http://th.php.net/manual/en/tidy.installation.php It says I need to "In PHP 5 you need only to compile using the --with-tidy opt...

Screenscraping the ugliest HTML you've ever seen in your life

I'm using PHP and libtidy to attempt to screen scrape what might possibly be the most horrendous and malformed use of HTML tables in history. The site closes few table, tr, td, font, or bold tags and consistently nests many different layers of tables within tables. Example snippet: <center> <table border="1" bordercolor="#000000" cells...

FireFox version of tidy

I'm looking to create a binary that takes an html string on stdin, and spits out a well formed xml string representing the DOM. Basically "tidy" but using FireFox. Any ideas where I should hook into the FF source code? ...

PHP "pretty print" HTML (not Tidy)

I'm using the DOM extension in PHP to build some HTML documents, and I want the output to be formatted nicely (with new lines and indentation) so that it's readable, however, from the many tests I've done: "formatOutput = true" doesn't work at all with saveHTML(), only saveXML() Even if I used saveXML(), it still only works on elements...

Notepad++ HTML Tidy

Is HTML Tidy for Notepad++ broken? None of the commands except Tidy (the first one) work. They don't show any message, even with all text selected. I really need Tidy to work, or is it just a limitation of the newest version of N++, or lack of support? Also, the custom syntax dialog freezes whenever I select a color from the color dialo...

Formatting PHP Code within Vim

I'm currently using Vim as a lightweight IDE. I have NERDTree, bufexplorer, supertab, and ctags plugins which do almost everything I want. Only big thing missing for me is auto code formatting. I'm working with some messy PHP code which has inconsistent indenting and code formatting, ideally I could highlight the code I want formatted (...

Beautiful Soup and uTidy

I want to pass the results of utidy to Beautiful Soup, ala: page = urllib2.urlopen(url) options = dict(output_xhtml=1,add_xml_decl=0,indent=1,tidy_mark=0) cleaned_html = tidy.parseString(page.read(), **options) soup = BeautifulSoup(cleaned_html) When run, the following error results: Traceback (most recent call last): File "soup.py...

Clean up PHP/HTML pages

Does anybody know of a good tool that cleans up files with php and html in it? I've used Tidy before but it doesn't do a good job at leaving the php code alone. I know there are various implementations of tidy but does any tool reign champion specifically for pages with html and php? Thanks ...

Tidy gives non-standard HTML

Hi, I use Tidy to clean and make HTML files compliant to HTML/XHTML. However, output contains non-standard attributes values like : <table id='abc'>... or <input type='button' /> (look at the single quotes). How can I configure Tidy to give strict XHTML output? Thank you in advance! ...

Beyond Compare 3.0 and XML Tidy, but save Original Format

Hi, I am using Beyond Compare 3.0 and using the XML tidy and XML tidied with sorted attributes plugins. Its great and while I would like it to show me the "tidied" XML, Once I resolve me merge, I would like to save it back with the original formatting (not the tidied format). Is there anyway of doing this? For example, if I have these t...