views:

701

answers:

2

I am using HtmlAgilityPack. I create an HtmlDocument and LoadHtml with the following string:

<select id="foo_Bar" name="foo.Bar"><option selected="selected" value="1">One</option><option value="2">Two</option></select>

This does some unexpected things. First, it gives two parser errors, EndTagNotRequired. Second, the select node has 4 children - two for the option tags and two more for the inner text of the option tags. Last, the OuterHtml is like this:

<select id="foo_Bar" name="foo.Bar"><option selected="selected" value="1">One<option value="2">Two</select>

So basically it is deciding for me to drop the closing tags on the options. Let's leave aside for a moment whether it is proper and desirable to do that. I am using HtmlAgilityPack to test HTML generation code, so I don't want it to make any decision for me or give any errors unless the HTML is truly malformed. Is there some way to make it behave how I want? I tried setting some of the options for HtmlDocument, specifically:

 doc.OptionAutoCloseOnEnd = false;
 doc.OptionCheckSyntax = false;
 doc.OptionFixNestedTags = false;

This is not working. If HtmlAgilityPack cannot do what I want, can you recommend something that can?

+1  A: 

The exact same error is reported on the HAP home page's discussion, but it looks like no meaningful fixes have been made to the project in a few years. Not encouraging.

A quick browse of the source suggests the error might be fixable by commenting out line 92 of HtmlNode.cs:

// they sometimes contain, and sometimes they don 't...
ElementsFlags.Add("option", HtmlElementFlag.Empty);

(Actually no, they always contain label text, although a blank string would also be valid text. A careless author might omit the end-tag, but then that's true of any element.)

bobince
Awesome. Thanks a lot. Works great!
Tim Scott