When I realized I needed to create an index for approximately 50 XHTML pages, which may be added/deleted/renamed/moved in the future, I thought "No problem -- I'll write a quick index generator using LINQ to XML, since XHTML definitely counts as XML".
Of course, as soon as I tried running it, I found out about the fact that XLINQ chokes on XHTML entities like . I got around it by using the following algorithm:
- Read XHTML file into string.
- Use regex search and replace on that string to add a section into the DOCTYPE that defines all relevant entities (because I only care about the "title" attribute in the files I read and my output file does not use any entities right now, it just sets them all to blank, but I may add the actual values later).
- Parses the result into an XDocument.
To save a file, I do the opposite:
- Save XDocument to a string.
- Strip out the entity definitions.
- Save to file.
My question is, are there any libraries (especially built-in .Net ones) I can use that will read XHTML files into XDocuments? The code I wrote has accomplished its purpose (to generate the current index and to test the rest of the generator program), and I would really prefer not to spend time testing it if someone else already wrote and tested the same thing.
Thank y'all so much for your time,
Ria.
Edit: Thank you so much; this works! I still have to do a little string processing when I save the XHTML (guess the library was not really made for that:)) and I had to fiddle with the source of the Agility Pack slightly to get it to stop indiscriminately sticking a CDATA section around the insides of every style attribute (even when there was already one present), but that's the point of Open Source, right?