views:

94

answers:

4

I have a Asp.net web application running with the following config setting.

<xhtmlConformance mode="Legacy"/>

This limits use of AJAX and compatibility with multiple browser.

If my understanding is correct, the HTML code of the aspx pages need to be fixed to comply with XHTML 1.0 Transitional.

There are alot of HTML pages, ~1000, is there a tool that could speed up this process?

+1  A: 

Are the pages currently in HTML or XHTML? If they are in HTML, leave them. There is no benefit to using XHTML.

Delan Azabani
I doubt that any **legacy** web app would be HTML 5.
Oded
Given that the original question is about legacy HTML and the HTML 5 specification is not finalized yet, I'd be surprised if they have HTML 5 compliant markup.
Rob
They are not in HTML 5, and I need to make them valid XHTML 1.0, closed tags etc, like described here http://en.wikipedia.org/wiki/XHTML#Valid_XHTML_documentsand i'm looking for a tool to speed things up, just like CodeIt.Right can do in code behind.
GenEric35
Edited: made more general to ask for HTML in general. To recap, if you have HTML, keep it that way.
Delan Azabani
A: 

Tidy is probably the best there is, but I've never been entirely happy with it, and if the code is that bad it is usually a sign that it needs rewriting anyway.

David Dorward
can Tidy work with aspx files? it wants me to remove the server controls(custom ones) from my aspx files before tidying them.
GenEric35
+1  A: 

HTML 5 supports the use of XML syntax so moving from legacy HTML to XHTML is going to have some long term advantages since you pages will be syntactically correct which could allow for an easier move to HTML 5 or XHTML 2 when they are completed.

As noted by David Dorward in his answer, HTML Tidy might be useful if you have fairly constant HTML markup, but given the number of pages I would be surprised if that was the case. If you do a Google search for "legacy html to xhtml tool" some other tools turnup that might be worth looking into.

Another thing to take into account is how the pages are written. Most legacy HTML pages don't have a clear separation between markup and styles so you are likely going to have to pull styling out for CSS files that automatic tools can be hit or miss with. As such, you might be looking at a situation where the best thing to do is sit down with the site and run the pages though a validator such as the W3C Markup Validation Service or use a browser plug-in to do the testing while you are browsing through the site. The Html Validator for Firefox is pretty good and this is generally how I test web applications. The downside to this that if there is any HTML that is being generated in the code-behind pages, you may not encounter it during testing.

Rob
Thanks, in this case the CSS is already pulled out of the HTML files, the main problems I'm looking at is tags not properly closed, tags in upper case that should be in lower case
GenEric35
"HTML 5 uses XHTML syntax" -- really? I don't think so.
Tomalak
@Tomalak - I went back and read the document that I read that in (http://www.w3.org/TR/2008/WD-html5-diff-20080122/#syntax) and it looks like I was half right in my response since XHTML is XML. HTML5 supports two forms of syntax, one of which is specific to HTML5, and XML syntax. I've updated my answer accordingly.
Rob
@GenEric35 - In regards to the tags it could be hit or miss in a tool that can close them; however, converting to lowercase is something that you might be able to do that with regular expressions.
Rob
@Rob yes, found that in this case it's going to be a combination of a few methods, find/replace, regular expression, visual studio can do a bit of formating in the aspx files.
GenEric35
+2  A: 

The DMS Software Reengineering Toolkit is a tool for automating mass change to large source code bases.

DMS uses language-precise parsers to read source code, and langauge-precise analyzers to determine meaning or problems. One can use Source to Source program transformations to carry out changes of the form, "if you see this, replace it by that" stated in terms of language surface syntax. The patterns are driven by the language grammar and are applied to ASTs generated by DMS parsers.

DMS has language front ends for a variety of languages, including HMTL and XHTML.

DMS has been used on "dirty" HTML (even that has a precise definition!) to refactor it. It might be a pretty good match for your fix-1000-pages problem.

Ira Baxter