views:

314

answers:

2

Hi

I could not find any tutorials on their site. I am wondering can I use Html Agility Pack and use it to parse a string?

Like say I have

string = "<b>Some code </b>

could I use agility pack to get rid of the <b> tags? All the examples I seen so far have been loading like html documents.

+1  A: 

If it's html then yes.

string str = "<b>Some code</b>";
// not sure if needed
string html = string.Format("<html><head></head><body>{0}</body></html>", str);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

// look xpath tutorials for how to select elements
// select 1st <b> element
HtmlNode bNode = doc.DocumentNode.SelectSingleNode("b[1]");
string boldText = bNode.InnerText;
Mika Kolari
Ok then what would I do with it how would I do some parsing?
chobo2
Hmm thanks but I copied and pasted that code into a console app and imported html agility back but on HtmlNode line I get a null reference exception.
chobo2
Maybe it's HtmlNode bNode = doc.DocumentNode.SelectSingleNode("/b[1]");
Mika Kolari
I still get the same error.
chobo2
Try HtmlNode bNode = doc.DocumentNode.SelectSingleNode("//b[1]");
Rohit Agarwal
A: 

I dont think this is really the best use of HtmlAgilityPack.

Normally I see people trying to parse large amounts of html using regular expressions and I point them towards HtmlAgilityPack but in this case I think it would be better to use a regex.

Roy Osherove has a blog post describing how you can strip out all the html from a snippet:

Even if you did get the correct xpath with Mika Kolari's sample this would only work for a snippet with a <b> tag in it and would break if the code changed.

rtpHarry