views:

410

answers:

2

Search:

Scripting+Language Web+Pages Applications

Results:

...scripting language originally...producing dynamic web pages. It has...graphical applications....purpose scripting language that is...d creating web pages as output...

Suppose I want a value that represents the amount of characters to allow as padding on either side of the matched terms, and another value that represents how many matches will be shown in the result (ie, I want to see only the first 5 matches, nothing more).

How exactly would you go about doing this?

This is pretty language-agnostic, but I will be implementing the solution in a PHP environment, so please restrict answers to options that do not require a specific language or framework.

Here's my thought process: create an array from the search words. Determine which search word has the lowest index regarding where it's found in the article-body. Gather that portion of the body into another variable, and then remove that section from the article-body. Return to step 1. You might even add a counter to each word, skipping it when the counter reaches 3 or so.

Important:

The solution must match all search terms in a non-linear fashion. Meaning, term one should be found after term two if it exists after term two. Likewise, it should be found after term 3 as well. Term 3 should be found before term 1 and 2, if it happens to exist before them.

The solution should allow me to declare "Only allow up to three matches for each term, then terminate the summary."

Extra Credit:

Get the padding-variable to optionally pad words, rather than chars.

A: 

Personally I would convert the search terms into Regular Expressions and then use a Regex Find-Replace to wrap the matches in strong tags for the formatting.

Most likely the RegEx route would be you best bet. So in your example, you would end up getting three separate RegEx values.

Since you want a non-language dependent solution I will not put the actual expressions here as the exact syntax varies by language.

Mitchel Sellers
+1  A: 

My thought process:

  1. Create a results array that supports non-unique name/value pairs (PHP supports this in its standard array object)
  2. Loop through each search term and find its character starting position in the search text
  3. Add an item to the results array that stores this character position it has just found with the actual search term as the key
  4. When you've found all the search terms, sort the array ascending by value (the character position of the search term)
  5. Now, the search results will be in order that they were found in the search text
  6. Loop through the results array and use the specified word padding to get words on each side of the search term while also keeping track of the word count in a separate name/value pair

Pseudocode, or my best attempt at it:

function string GetSearchExcerpt(searchText, searchTerms, wordPadding = 0, searchLimit = 3)
{
  results = new array()
  startIndex = 0
  foreach (searchTerm in searchTerms) 
  {
    charIndex = searchText.FindByIndex(searchTerms, startIndex) // finds 1st position of searchTerm starting at startIndex
    results.Add(searchTerm, charIndex)
    startIndex = charIndex + 1
  }
  results = results.SortByValue()
  lastSearchTerm = ""
  searchTermCount = new array()
  outputText = ""
  foreach (searchTerm => charIndex in results)
  {
    searchTermCount[searchTerm]++
    if (searchTermCount[searchTerm] <= searchLimit)
    {
      // WordPadding is a simple function that moves left or right a given number of words starting at a specified character index and returns those words
      outputText += "..." + WordPadding(-wordPadding, charIndex) + "<strong>" + searchTerm + "</strong>" + WordPadding(wordPadding, charIndex)
    }
  }

  return outputText
}
John Rasch