I am using HtmlAgilityPack. Is there a one line code that I can get all inner text of html, e.g., remove all html tags and scripts?
+1
A:
Like this:
document.DocumentNode.InnerText
Note that this will return the text content of <script>
tags.
To fix that, you can remove all of the <script>
tags, like this:
foreach(var script in doc.DocumentNode.Descendants("script").ToArray())
script.Remove();
foreach(var style in doc.DocumentNode.Descendants("style").ToArray())
style.Remove();
SLaks
2010-05-06 23:07:28
It seems that DocumentNode does not have a function named Descendant?"'HtmlAgilityPack.HtmlNode' does not contain a definition for 'Descendants'"
Yang
2010-05-06 23:22:18
What version are you using?
SLaks
2010-05-06 23:36:51
HTML Agility Pack V1.3.0.0, is it too old?
Yang
2010-05-07 01:12:18
Yes; get a newer version.
SLaks
2010-05-07 01:22:35