Library for parsing XHTML files with XLINQ

When I realized I needed to create an index for approximately 50 XHTML pages, which may be added/deleted/renamed/moved in the future, I thought "No problem -- I'll write a quick index generator using LINQ to XML, since XHTML definitely counts as XML".

Of course, as soon as I tried running it, I found out about the fact that XLINQ chokes on XHTML entities like  . I got around it by using the following algorithm:

Read XHTML file into string.
Use regex search and replace on that string to add a section into the DOCTYPE that defines all relevant entities (because I only care about the "title" attribute in the files I read and my output file does not use any entities right now, it just sets them all to blank, but I may add the actual values later).
Parses the result into an XDocument.

To save a file, I do the opposite:

Save XDocument to a string.
Strip out the entity definitions.
Save to file.

My question is, are there any libraries (especially built-in .Net ones) I can use that will read XHTML files into XDocuments? The code I wrote has accomplished its purpose (to generate the current index and to test the rest of the generator program), and I would really prefer not to spend time testing it if someone else already wrote and tested the same thing.

Thank y'all so much for your time,
Ria.

Edit: Thank you so much; this works! I still have to do a little string processing when I save the XHTML (guess the library was not really made for that:)) and I had to fiddle with the source of the Agility Pack slightly to get it to stop indiscriminately sticking a CDATA section around the insides of every style attribute (even when there was already one present), but that's the point of Open Source, right?

ansaurus

tags:

views:

answers:

Library for parsing XHTML files with XLINQ

related questions