tags:

views:

30

answers:

1

Hi, I'm using following method to highlight the keywords in a given text.

private string HighlightSearchKeyWords(string searchKeyWord, string text)
        {
            Regex keywordExp = new Regex(@" ?, ?");
            var pattern = @"\b(" + keywordExp.Replace(Regex.Escape(searchKeyWord), @"|") + @")\b";
            Regex exp = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
            return exp.Replace(text, @"<span class=""search-highlight"">$0</span>");

        }

Sample Text: "What is .net Programming? Pl suggest few e-books"

Keyword: ".net"

When i try to search with key word ".net" .net is not getting highlighted in the given sample text.

When i try to search with key word "e-books" e-books is getting highlighted in the given sample text.

What would be the problem. Can anyone pl let me know where exactly do i need modify/

+2  A: 

There is no word boundary before ".net" because \b only looks for a change between \w and \W, and both . and (space) fall into the \W category, so there is no boundary between them.

One option is to simply look for "not a word-character" - i.e. not explicitly checking for the boundary, only for the lack of a word character, using a negative lookbehind:

(?<!\w)

You could also check for anything that's not non-whitespace character, like so:

(?<!\S)

This one is a double-negative - it might seem more obvious to do (?<=\s) (or (?<=\W) for previous example), but these will prevent matches at the start of line from matching.

For an example of the difference between these two - the first one would match the .NET in C#.NET whilst the second one would not.

Since you're using .NET regex, you've fortunately got a fairly complete set of regex functionality - but it's worth point out that some other regex implementations don't support negative lookbehind - for those, you would need to use syntax like this:

(?<=\W|^)
(?<=\s|^)

(In all these cases, you want the equivalent lookahead on the other end.)

So, here's how those four variants would look in your pattern:

var pattern = @"(?<!\w)(" + keywordExp.Replace(Regex.Escape(searchKeyWord), @"|") + @")(?!\w)";
var pattern = @"(?<!\S)(" + keywordExp.Replace(Regex.Escape(searchKeyWord), @"|") + @")(?!\S)";
var pattern = @"(?<=\s|^)(" + keywordExp.Replace(Regex.Escape(searchKeyWord), @"|") + @")(?=\s|$)";
var pattern = @"(?<=\W|^)(" + keywordExp.Replace(Regex.Escape(searchKeyWord), @"|") + @")(?=\W|$)";
Peter Boughton
Thanks for ur answer. Pl suggest me what do i need to in this scenario.Also i need to match the exact word.
Chakri
Hi Chakri, I was just editing to add an actual example - that's done now. Let me know if any more details would be helpful.
Peter Boughton
Hi Peter, I tried the code suggested by u. if the the keyword .net is the at the starting of text then it is not highlighting.Sample Text: .net programming fundamentals.if .net is not at starting position then it is getting highlighted.Pl help me.
Chakri
Ah, sorry, I should have considered that! Simple solution is to add `|^` but I'll go update answer with more details.
Peter Boughton
Ok, hopefully this time the answer is clearer and works in all situations, but let me know if there's anything else I've missed.
Peter Boughton
I'm very thankful to u Peter... I'm very new to regular expressions. It helped me a lot. thanks for ur support
Chakri