views:

436

answers:

1

When I realized I needed to create an index for approximately 50 XHTML pages, which may be added/deleted/renamed/moved in the future, I thought "No problem -- I'll write a quick index generator using LINQ to XML, since XHTML definitely counts as XML".

Of course, as soon as I tried running it, I found out about the fact that XLINQ chokes on XHTML entities like  . I got around it by using the following algorithm:

  1. Read XHTML file into string.
  2. Use regex search and replace on that string to add a section into the DOCTYPE that defines all relevant entities (because I only care about the "title" attribute in the files I read and my output file does not use any entities right now, it just sets them all to blank, but I may add the actual values later).
  3. Parses the result into an XDocument.

To save a file, I do the opposite:

  1. Save XDocument to a string.
  2. Strip out the entity definitions.
  3. Save to file.

My question is, are there any libraries (especially built-in .Net ones) I can use that will read XHTML files into XDocuments? The code I wrote has accomplished its purpose (to generate the current index and to test the rest of the generator program), and I would really prefer not to spend time testing it if someone else already wrote and tested the same thing.

Thank y'all so much for your time,
Ria.

Edit: Thank you so much; this works! I still have to do a little string processing when I save the XHTML (guess the library was not really made for that:)) and I had to fiddle with the source of the Agility Pack slightly to get it to stop indiscriminately sticking a CDATA section around the insides of every style attribute (even when there was already one present), but that's the point of Open Source, right?

+3  A: 

This might be helpful: LINQ & Lambda, Part 3: Html Agility Pack to LINQ to XML Converter

Gonzalo Quero
Unfortunately, this link appears to be broken now.
James Sulak
Nick Martyshchenko