I'm wondering if this is possible.
I have html like so:
<p>
<font face="Georgia">
<b>History</b><br> <br>Two of the polysaccharides used in the manufacture of...</font>
<a title="PubMed" href="http://www.www.gov/pubmed/" target="_blank">
<font face="Georgia">) and this web site for new development by...well as Self Affirmed Medical Food GRAS status.
</font>
</p>
<p>
<font face="Georgia">[READMORE]</font>
</p>
<p><font face="Georgia"><br><strong>Proprietary Composition</strong><br>
<br>The method in which soluble fibres are made into... REST OF ARTICLE...
</p>
Yes, it's ugly html and it comes from a WYSIWYG so I have little control over it.
What I want to do is search for [READMORE] in the document, remove any parent tags ( in this case, the <font>
and the <p>
tags ) and replace them with a readmore link while wrapping the REST of the document in a giant `...rest of article...
I'm pretty sure the HtmlAgilityPack will get me part of the way there, but I'm just trying to figure out where to start.
So far, I'm pretty sure that I have to use htmlDoc.DocumentNode.SelectSingleNode(//p[text()="[READMORE]"])
or something. I'm not too familiar with XPATH.
For my documents, the readmore may or may not be in a nested font
tag.
Also, in some cases, it may not be in a tag at all, but rather at the document root. I can just do a regular search and replace in that case and it should be straightforward.
My ideal situation would be something like this (PSEUDOCODE)
var node = SelectNodeContaining("[READMORE]").
node.Replace( "link here" );
node.RestOfDocument().Wrap("<div class='wrapper'");
I know, I'm dreaming... but I hope this makes sense.