When you search something in Stackoverflow it cuts the portion of the question description that best matches your criteria and after that it marks the criteria words.
I wonder the best way to do this manually in C#, meaning without the help of a full-text search engine.
The main problem is how to select the best text portion in a fast way?
What I did so far is:
- I obtain the space indexes of the text. This allows me to know where the words begin so that I can start my substring tests from them.
- From each of the space indexes, I get 300 characters ahead and test how many occurrences of the keywords I find.
- I assume that the 300 characters long portion that has the most occurrences is the best so I cut it from the original text.
Is this a good approach? Is there a faster way? Is counting the number of occurrences the best way to find the most relevant portion?