views:

88

answers:

2

I have the text from a PDF document available to me, and I want to display snippets from the text everywhere the user-entered search phrase appears. Say the search term is"iphone", I want to display 200 characters around each iphone word instance in the document. Ideally, the snippet should also start from the beginning of a sentence.

What's the most efficient way to do this using .NET/C#?

+1  A: 

pseudocode:

int start = -1;
int index = str.IndexOf(search);
while (index != -1)
{
    print str.Substring(index-100, index+100);
    index = str.IndexOf(search, index);
}
tster
+1  A: 

You could do this with a regular expression:

\s.{0,100}iphone.{0,100}\s

This says "match a whitespace character, up to 100 characters, the word 'iphone', and up to another 100 characters and finally a whitespace character." Looking for whitespace at either end makes sure you get whole words.

You would use it like this (note that you can Trim() the whitespace from the matches):

var regex = new Regex(@"\s.{0,100}iphone.{0,100}\s",
                      RegexOptions.IgnoreCase | RegexOptions.Compiled);

for (Match m = regex.Match(inputString); m.Success; m = m.NextMatch())
    Console.WriteLine(m.Value.Trim());

Instead of matching the whitespace and then trimming it, you could match the text only by using a group.

GraemeF
I actually prefer this answer to my own
tster