tags:

views:

149

answers:

4

If I have a text field that contains say a title and i have a list of keywords, how can i search the title checking for (n) numbers of keywords in the title?

So if my title is "Baking a chicken, bacon and leek pie" and the user searches for "chicken bacon turnip" i'd like to return the above recipe.

essentially i'd like to say that if the title contains say 2 or more of the search terms then it's deemed as valid and should be returned. but if it only contains 1 then disregard it.

ideally i'd like them weighted so that the more terms that are present the higher in the list it is but that may be version 2. :)

edit

I should mention at this point that i'd like this to be native .net and c#.

+4  A: 

Okay, I know you said 'do it in Linq'. ASsuming you're talking about taking .Net native string and doing it using Linq to Objects, then I guess the most obvious solution is going to be to break up the text by a regex working on word boundaries; and then to iterate through each result matching against the input phrases.

However...

Judging by your idea for the 'v2' I think you should probably be looking at something more powerful and geared around text searching - so how about using a Lucene.Net text index?

It offers very powerful and very fast full-text search - and has the ability to process boolean rules; aliases, stemming, all that kind of stuff.

It really does rock.

UPDATE - Since you mentioned Linq to Sql in your comments

You can also use SQL Full-Text indexes on your table however, there is one catch: there is no native Linq To Sql translation to the CONTAINSTABLE et al clauses.

So instead you can use dynamic query generation via a string, and then feed that into the DataContext.ExecuteQuery<TResult> member. If the select returns columns required to construct the entity type you want, it'll work like a charm.

Or, of course, you can just wrap a stored procedure that does it instead ;)

Andras Zoltan
I was kinda hoping to have a native .net solution and avoid java. i know i didn't stipulate that in the question. apologies for that
griegs
@griegs: Lucene.Net is a complete .Net native port of Lucene. No java required. Have also updated my answer in light of your mentioning Linq to Sql
Andras Zoltan
Ah yes, i see that now. i think i saw the feather and the word java and my brain ignored the rest of the text.
griegs
@griegs - yeah I get that mental block too ;)
Andras Zoltan
+1 Wow, this actually looks cool. i could potentially store the say 1000 top hit records in the engine for faster searching of them also.
griegs
@griegs - for sure. It's incredible how many records you can actually fit in the index (for not very much size) and also how easy it is to keep the index up to date via incremental indexing.I've used for some stuff searching tens of thousands of objects with a lot of textual content. Result speed sub-second if multithreaded correctly.
Andras Zoltan
I don't suppose you could post a snippet of code could you? I'm running into trouble with the command Query query = QueryParser.parse("text", "fieldname", analyzer);. It's referenced here; file://Incubating-Apache-Lucene.Net-2.0-004-11Mar07.bin/src/Lucene.Net/Overview.html and i can't find parse in the object
griegs
Wait, got it thanks. Going to hammer it now and see what it can do.
griegs
Awesome, thanks @Andras Zoltan
griegs
@Griegs - apologies for deserting you - it was midnight UK time here and a school night; had to sleep!
Andras Zoltan
fully understand. thanks for your help. i'm really on board Lucene and am implementing it now.
griegs
A: 

if it were me I would simply do something like this....

Create a helper class that does 2 things, splits the title and returns a score based on the keyword match....

public static class Helper
{

  public static int GetScore(string Title, params string[] keywords)
  {
    // your routine that calcs a score based on the matchs against the Title.
  }
}

then you can use a linq statement like....

var matches = from t in GetYourTitles
              let score = Helper.GetScore(t, keywordlist)
              where score >= 2
              orderby score
              select t;
Tim Jarvis
Since he's using LINQ to SQL, this wouldn't work.
StriplingWarrior
A: 

Doing a text index like Andras suggests is probably your best bet. But to answer the question: you can write a method to custom-build an expression tree to represent a selector that adds 1 to a property for each search keyword that matches. See below:

var entries = new[] { new Entry{ ID = 1,  Title = "Baking a chicken, bacon and leek pie"} }.AsQueryable();
var search = "chicken bacon turnip";
var q = entries.Select(GetSelector(search));
var matches = q.Where(e => e.MatchCount > 1);

public Expression<Func<Entry, EntryMatchCount>> GetSelector(string search)
{
    var searchWords = search.Split(new[] {' '});
    // Rather than creating the selector explicitly as below, you'll want to
    // write code to generate this expression tree.
    Expression<Func<Entry, EntryMatchCount>> selector = e =>
            new EntryMatchCount
            {
                ID = e.ID,
                MatchCount = (e.Title.Contains(searchWords[0]) ? 1 : 0) +
                            (e.Title.Contains(searchWords[1]) ? 1 : 0) +
                            (e.Title.Contains(searchWords[2]) ? 1 : 0)
            };
    return selector;
}
StriplingWarrior
A: 

AODBDataContext db = new AODBDataContext();

        var fItems = from item in db.Items
                     where SqlMethods.Like(item.Name, l)
                     where cats.Contains(item.ItemType)
                     where item.QL >= minQL
                     where item.QL <= maxQL
                     select item;