views:

229

answers:

3

I have a Filter Method in my User Class, that takes in a list of Users and a string of search terms. Currently, the FindAll predicate splits the terms on spaces then returns a match if any of the searchable properties contain any part of the terms.

public static List<User> FilterBySearchTerms( List<User> usersToFilter, string searchTerms, bool searchEmailText )
{
    return usersToFilter.FindAll( user =>
    {
     // Convert to lower case for better comparison, trim white space and then split on spaces to search for all terms
     string[] terms = searchTerms.ToLower().Trim().Split( ' ' );

     foreach ( string term in terms )
     {
      // TODO: Is this any quicker than two separate ifs?
      if ( 
        (searchEmailText && user.Email.ToLower().Contains( term )) 
        || (
         user.FirstName.ToLower().Contains( term ) || user.Surname.ToLower().Contains( term ) 
         || user.Position.ToLower().Contains( term ) || user.Company.ToLower().Contains( term ) 
         || user.Office.ToLower().Contains( term ) 
         || user.Title.ToLower().Contains( term )
        )
      )
       return true;
      // Search UserID by encoded UserInviteID
      else 
      {
       int encodedID;
       if ( int.TryParse( term, out encodedID ) )
       {
        User fromInvite = GetByEncodedUserInviteID( encodedID );
        if ( fromInvite != null && fromInvite.ID.HasValue && fromInvite.ID.Value == user.ID )
         return true;
       }
      }
     }

     return false;
    } );
}

I have received a new requirement so that the ordering is now important. For example, a search for 'Mr Smith' should have Mr Adam Smith ahead of Mrs Eve Smith, which might make my use of Contains inappropriate. However, the most important thing is the number of property/part of term matches.

I'm thinking I could have a couple of counters to keep track of complete term matches and partial matches, then order by those two. I'm also open to suggestions on how the Filter method can be improved - perhaps using something else entirely.

A: 

Is the distinction between full and partial matching what's relevant, or just standard lexicographical sorting? If you were to sort Mr. Adam Smith and Mrs. Eve Smith they would be placed in that order. That would just allow you to use a standard ordering lambda.

Adam Robinson
The lexicographical ordering is of secondary importance, after the number of matches.
Rich
+4  A: 

Here's a LINQ-based solution. It will be a bit more of a pain if you're not using .NET 3.5, I'm afraid. It separates out the details of the matching from the query itself, for clarity.

You'll need to create a LowerCaseUser method which returns a User object with all the properties lower-cased - it makes more sense to do that once than for every search term. If you can put that and UserMatches into the User class, so much the better. Anyway, here's the code.

public static List<User> FilterBySearchTerms
    (List<User> usersToFilter, 
     string searchTerms,
     bool searchEmailText)
{
    // Just split the search terms once, rather than for each user
    string[] terms = searchTerms.ToLower().Trim().Split(' ');

    return (from user in usersToFilter
            let lowerUser = LowerCaseUser(user)
            let matchCount = terms.Count(term => 
                                         UserMatches(lowerUser, term))
            where matchCount != 0
            orderby matchCount descending
            select user).ToList();
}

private static bool UserMatches(User user, string term,
                                bool searchEmailText)
{
    if ((searchEmailText && user.Email.Contains(term))
        || user.FirstName.Contains(term)
        || user.Surname.Contains(term)
        || user.Position.Contains(term)
        || user.Company.Contains(term)
        || user.Office.Contains(term)
        || user.Title.Contains(term))
    {
        return true;
    }
    int encodedID;
    if (int.TryParse(term, out encodedID))
    {
        User fromInvite = GetByEncodedUserInviteID(encodedID);
        // Let the compiler handle the null/non-null comparison
        if (fromInvite != null && fromInvite.ID == user.ID)
        {
            return true;
        }
    }
    return false;
}
Jon Skeet
Would you add a secondary orderby for the user's name after the orderby matchCount in order to preserve alphabetical precedence?
Erich Mirabal
Possibly, if that was wanted. All kinds of options there :)
Jon Skeet
Hi Jon, thanks for the reply. I got a couple of errors from your code. 1. The type arguments for method System.Linq.Enumerable.Count<TSource>(System.Collections.Generic.IEnumerable<TSource>, System.Func<TSource,bool>)' cannot be inferred from the usage. Try specifying the type arguments explicitly.2. The name 'term' does not exist in the current contextI'm afraid I've not used Linq before so don't know how it works exactly...
Rich
Changing the let matchcount line to this fixes it: let matchCount = terms.Count( term => UserMatches( lowerUser, term, searchEmailText ) )
Rich
Whoops, sorry - will edit :)
Jon Skeet
Nice. I like the LowerCaseUser(user) that make the ToLower only once for the same string. If you make the terms (all terms) an argument of UserMatches, you can stop searching when found the first. (you don't need to count them all). But it brings you another loop in the code.
Stefan Steinegger
@Stefan: We do need to count terms, because we're ordering by the number of terms matched :)
Jon Skeet
+1  A: 

The first thing, I'd say that you need to do, is to break the large lazily evaluated or conditions into separate conditions. Otherwise you're never going to resolve how many matches you're actually getting. After this you'll probably need a score for each user which reflects how well the search terms match it.

I'm also assuming you are able to use LINQ here as you are already using lambda expressions.

    class ScoredUser
    {
        public User User { get; set; }
        public int Score { get; set; }
    }

    public static List<User> FilterBySearchTerms(List<User> usersToFilter, string searchTerms, bool searchEmailText)
    {
        // Convert to lower case for better comparison, trim white space and then split on spaces to search for all terms
        string[] terms = searchTerms.ToLower().Trim().Split(' ');

        // Run a select statement to user list which converts them to
        // a scored object.
        return usersToFilter.Select(user =>
        {
            ScoredUser scoredUser = new ScoredUser()
            {
                User = user,
                Score = 0
            };

            foreach (string term in terms)
            {
                if (searchEmailText && user.Email.ToLower().Contains(term))
                    scoredUser.Score++;

                if (user.FirstName.ToLower().Contains(term))
                    scoredUser.Score++;

                if (user.Surname.ToLower().Contains(term))
                    scoredUser.Score++;

                // etc.
            }

            return scoredUser;

            // Select all scored users with score greater than 0, order by score and select the users.
        }).Where(su => su.Score > 0).OrderByDescending(su => su.Score).Select(su => su.User).ToList();
    }

Having the method return a scored customer also allows you easily tweak the scoring balances later on. Say you want a matching first name matter more than a matching company for example.

Mikko Rantanen
.. and of course I completely confused myself with different terms vs different fields. So breaking the lazy evaluation is not required, though it does allow different score amounts on different matches. Should probably be combined with "else if"s though in such a case.
Mikko Rantanen