tags:

views:

918

answers:

6

What would be the best way to search through HTML inside a C# string variable to find a specific word/phrase and mark (or wrap) that word/phrase with a highlight?

Thanks,

Jeff

+1  A: 

Regular Expression would be my way. ;)

Eddie Parker
A: 

Searching for strings, you'll want to look up regular expressions. As for marking it, once you have the position of the substring it should be simple enough to use that to add in something to wrap around the phrase.

Goog
+1  A: 

If the HTML you're using XHTML compliant, you could load it as an XML document, and then use XPath/XSL - long winded but kind of elegant?

An approach I used in the past is to use HTMLTidy to convert messy HTML to XHTML, and then use XSL/XPath for screen scraping content into a database, to create a reverse content management system.

Regular expressions would do it, but could be complicated once you try stripping out tags, image names etc, to remove false positives.

MrTelly
A: 

In simple cases, regular expressions will do.

string input = "ttttttgottttttt";
string output = Regex.Replace(input, "go", "<strong>$0</strong>");

will yield: "tttttt<strong>go</strong>ttttttt"

But when you say HTML, if you're referring to final text rendered, that's a bit of a mess. Say you've got this HTML:

<span class="firstLetter">B</span>ook

To highlight the word 'Book', you would need the help of a proper HTML renderer. To simplify, one can first remove all tags and leave only contents, and then do the usual replace, but it doesn't feel right.

Gorkem Pacaci
A: 

You could look at using Html DOM, an open source project on SourceForge.net. This way you could programmatically manipulate your text instead of relying regular expressions.

mdresser
it is in alpha status with last update in 2005, which means no longer maintained.
Priyank Bolia
+1  A: 

I like using Html Agility Pack very easy to use, although there hasn't been much updates lately, it is still usable. For example grabbing all the links

HtmlWeb client = new HtmlWeb();
HtmlDocument doc = client.Load("http://yoururl.com");            
HtmlNodeCollection Nodes = doc.DocumentNode.SelectNodes("//a[@href]");         

foreach (var link in Nodes)
{                
    Console.WriteLine(link.Attributes["href"].Value);
}
Zen