tags:

views:

120

answers:

3

how do you serialize html in c sharp?

I think I know how to use XSD.exe to create C Sharp classes from XML that can be used with the XmlSerializer class to serialize and verify the xml document.

Is there a way to do the same sort of thing with an HTML document? I have tried but the xsd command line says that the remote name www.w3.org cannot be resolved.

At a minimum, is there a way to use C Sharp to find out if an HTML file is valid?

+2  A: 

The HTMLAgilityPack is an open source library that parses HTML easily for you. You can then search/manipulate the structure of the document quite easily.

It's quite forgiving with the HTML you provide it, so I'm not sure if it's a good way of checking that if you've got a strict xHTML valid document. But it should be able to parse anything a modern browser can.

Kirschstein
A: 

If it's XHTML that you're trying to validate, you can do it like this:

static void validate(string filename)
{
 XmlReaderSettings settings = new XmlReaderSettings();
 settings.ProhibitDtd = false;
 settings.ValidationType = ValidationType.DTD;
 settings.ValidationEventHandler +=
  new ValidationEventHandler(ValidationCallBack);
 settings.XmlResolver = new XhtmlUrlResolver();

 // Create the XmlReader object.
 XmlReader reader = XmlReader.Create(filename, settings);

 // Parse the file. 
 while (reader.Read()) ;
}

// Display any validation errors.
private static void ValidationCallBack(object sender, ValidationEventArgs e)
{
 Console.WriteLine("Validation Error: {0}", e.Message);
}

It will be a bit slow because it's downloading the schema files from the W3C web site.

ChrisW
A: 

To deserialize/parse HTML, I would also recommend HTMLAgilityPack. However, to validate the HTML, you could try running HTML Tidy. For XHTML, however, you can obtain an XSD.

Jacob