tags:

views:

51

answers:

1

I have a basic datatable, it can be completely generic for this example, except for that it contains a Username column.

I'd like to have a simple textbox and a button that performs a similarity search on the Username field. I know I can use the .Contains() method and it will translate to LIKE in sql, but is this the correct way of doing this?

Secondly, suppose that I have another item in a many to many relationship that I also want to search, in this case Label.

Data
{
   ID,
   Name,
   ...
}

Many
{ 
  DataID,
  OtherID
}

Other
{
   ID,
   Label
}

I'd eventually like to find all of the Data items with a Label similar to some search clause. Do I again just use .Contains?

I'd then like to sort to get the best matches for Username and Label in the same query; how can the combined likeness of {Username and Label} be sorted?

Edit: How are a LIKE query's results sorted? It is simply based on the index, and a binary it matches vs it does not match? I guess I'm not that concerned with a similarity score per say, I was more or less just wondering about the mechanism. It seems as though its pretty easy to return LIKE queries, but I've always thought that LIKE was a poor choice because it doesn't use indexes in the db. Is this true, if so does it matter?

+1  A: 

String similarity isn't something SQL can do well. Your best bet may be to find all the matches with the same first two (or three if necessary) characters and then, assuming this is a manageable number, compute the similarity score client-side using the Levenshtein distance or similar (see http://en.wikipedia.org/wiki/Levenshtein_distance).

Or if you are feeling brave you could try something like this! http://anastasiosyal.com/archive/2009/01/11/18.aspx

Hightechrider
That second link is intense. Its super cool to know that stuff like smith waterman can be done in sql though. I've more or less realized that the best I'm going to do is a binary result of like or not like, and for the most part thats fine. I was mostly inquiring as to the difficulty of a similarity search; and it appears, in my situation to not be worth the trouble.
Shawn
Client-side you can use C# libraries here:- http://code.google.com/p/google-diff-match-patch/
Hightechrider
+1 v. useful links. Thanks.
Noel Abrahams