views:

38

answers:

1

If you're using a component-based (aka Pull-based) web framework (e.g. Tapestry, Wicket, Struts et al), how do you determine that your markup passes W3C validation? Two approaches come to mind:

Crawl the running app

Pro:

  • All the markup required for validation exists on the page.

Cons:

  • Can be very complicated to hit every page and every case.
  • If something is wrong it may not be obvious which component is causing the problem (especially in large apps).
  • You may be validating the same component over and over again (duplicating effort).
  • It may take a very long time if there are many pages/components.

Crawl the HTML templates offline

Pros:

  • You only need to validate each component once.
  • If you find a problem you will know exactly which component is causing it.

Cons:

Most of the cons I can think of involve losing the context of the components because you won't have the full markup of a page.

  • You may not know the DOCTYPE for a given component.
  • It may be difficult to know what a component's parent is, which could lead to problems. E.g. Detecting the invalid case of an inline tag (e.g. <span>) containing a block tag (e.g. <form> or <p>).
  • HTML templates in these types of frameworks often contain invalid attributes and special symbols (usually to indicate something to the framework) that will not validate.

So the question is, if you're using a component-based architecture, how do you validate your markup? Are there any recommended techniques or, better yet, tools to do this?

EDIT: I'm a little surprised that there weren't more answers for this. Is it uncommon to validate your markup when using component-based frameworks? Or are there just not many people using them?

+1  A: 

You really want to do most of this type of validation and testing using the full served document. This assures that what you are validating is really what web browsers are seeing.

A good option for this depending on the number of urls you have to validate is to use the WDG's batch validator service.

http://htmlhelp.org/tools/validator/batch.html.en

Alternatively wdg and w3c have an offline validator that you can use with a script to aggregate the test results. A quick google search will give you a couple of them, and they aren't difficult to do yourself if you are so inclined.

You need to generate the list of url's yourself, either with a crawling script, or from your database. You can reduce the number of pages you have to actually validate if you have some pages with dynamic content that cannot be "broken" by your end users.

wlashell
Do you know if these validators check things in the HTTP header or do they just process markup? For example, would it catch the conflict between the markup having an XHTML1 DOCTYPE and the header containing a Content-Type of text/html? If not, then I could just feed it straight markup.
ntownsend
I believe it will catch the mismatch and throw an error.
wlashell