tags:

views:

262

answers:

5
        query = Files
            .Where(file => file.Fileinfo.Name.ToUpper().Contains(textBox1.Text.ToUpper()))
            .Take(7).ToList();

I hate asking this question, but I am simply not having any progress! It should seem like a trivial task, but I am not having any luck.

The above query needs to do a search through a list of filenames. It will return the search results in a list with the top 7 most related results. The event happens at "KeyPress".

Although, it is extremely unprecise and there is also happening some quite odd results.

Forexample: one if the items in my list has the name: "ZeroWidthSplit" If my textbox contains "Z", it works. If it contains ZE it works. If it contains "ZER" it still shows. If I type ZERO it disappears from the search results!

So I guess my question is: How do you search through a list of files, and make it return the top 7 most related results.

Oh and if there is only 4 related results, that is fine too. The limit is just 7.

Another example:

F.x if I write "sum of" as search query. It returns:

  1. 77 - sum of primes five thousand different ways
  2. 52 sum of consecutive primes below 1 million

If I write "sum of p" it returns:

  1. 77 - sum of primes five thousand different ways

If I write "sum of c" it returns nothing...

I can give you a lot more weird examples.

+1  A: 

I don't know it this is the source of the problem, but I would start by refactoring the code so that textBox1.Text is read only once:

string theText = textBox1.Text.ToUpper();

query = Files
            .Where(file => file.Fileinfo.Name.ToUpper().Contains(theText))
            .Take(7).ToList();
Konamiman
+3  A: 

The error does sound strange, but as Konamiman pointed out, a little refactoring won't hurt, although I don't like to use ToUpper or ToLower. I will do it the following way:

string theText = textBox1.Text;

query = Files
      .Where(file => file.Fileinfo.Name.IndexOf(theText, 
       StringComparison.OrdinalIgnoreCase) != -1)
      .Take(7).ToList();
Yogesh
+1 for the StringComparison enum. I always forget that it exists.
Konamiman
That is clever. unfortunally, the search results are the same
CasperT
A: 

Maybe O in Zero is actualy a number 0 and not a letter O ?

MichaelT
Sorry, no :) It is just one example out of many. Let me edit my OP and show you another example
CasperT
It would be really a big D'OH! moment if this were the problem. :-)
Konamiman
Added another example. I have a ton
CasperT
+1  A: 

How's "most related" defined in your scenario? If you just use Contains, the outcome would miss a lot of "related" results that don't match in the form of exact substring.

What kind of input are you expecting? If the input is just a single word, then this algorithm might work for you:

Fitness should be calculated as the length of the longest common substring of the target string found in the input string, divided by the length of the target string. For example, HPPLE has a score of .8 compared to APPLE, since the longest substring of APPLE found in HPPLE is 4 letters long, which is .8 of APPLE's length. Additionally, penalize the score by 0.1 for each extraneous letter the input string has over the length of the target string. For example. HAPPLE has a score of .9 compared to APPLE, because it is 1 letter longer, and otherwise has the complete substring APPLE. Note that this makes it possible for a candidate word to have a negative score.

Of course there are a lot of other, better algorithms to calculate distance. If input may contain multiple words, then you're probably better off with some other algorithm to calculate the "distance" between strings.

And umm...your original query didn't sort the result, so you're not taking the top 7 most related ones, you're just taking the first 7 results in a sequence.

To wrap things up, this might work for you:

using System;
using System.Collections.Generic;
using System.Linq;

public static class StringDistanceUtil {

    /// <summary>
    ///   Returns the longest common substring of the given two arguments.
    /// </summary>
    /// <param name="first">
    ///   the first string
    /// </param>
    /// <param name="second">
    ///   the second string
    /// </param>
    /// <returns>
    ///   the longest common substring of the given two arguments
    /// </returns>
    public static string LongestCommonSubstringWith(this string first, string second) {
        // could have used dynamic programming, or generalized suffix tree
        // to solve the LCS problem, but here we'll just stick to simplicity

        var start = 0; // The start in a of the longest found so far
        var len = 0;   // The length of the longest found so far
        for (var i = 0; i < first.Length - len; ++i) {
            for (var j = first.Length - i; j > len; --j) {
                if (second.Contains(first.Substring(i, j))) {
                    start = i;
                    len = j;
                    break; // Exit the inner loop
                }
            }
        }

        return first.Substring(start, len);
    }

    /// <summary>
    ///   Returns the distance of two strings.
    /// </summary>
    /// <param name="str">
    ///   a string
    /// </param>
    /// <param name="target">
    ///   the target string
    /// </param>
    /// <returns>
    ///   the distance from a string to the target string
    /// </returns>
    public static double DistanceFrom(this string str, string target) {
        var strLen = str.Length;
        var targetLen = target.Length;
        var ratio = str.LongestCommonSubstringWith(target).Length
                / (double) targetLen;
        var penalty =
            (strLen > targetLen) ?
                (0.1 * (strLen - targetLen))
                : 0;
        return ratio - penalty;
    }

    static void Main(string[] args) {
        var list = new List<string> {
            "zero",
            "range",
            "shot",
            "shoot",
            "hop",
            "rage",
            "fang",
            "age"
        };
        var target = "zero_range_shot";
        var top5mostRelated = list
            .OrderByDescending(str => str.ToUpper().DistanceFrom(target.ToUpper()))
            .Take(5).ToList();
        foreach (var str in top5mostRelated) Console.WriteLine(str);
    }
}

and the output is: range zero shot shoot fang

RednaxelaFX
A: 

Are you confirming the textbox1.Text property in the debugger?

Maybe TextChanged would be a better event for this than KeyPress?

David B