views:

329

answers:

3

We'll soon be embarking on the development of a new mobile application. This particular app will be used for heavy searching of text based fields. Any suggestions from the group at large for what sort of database engine is best suited to allowing these types of searches on a mobile platform?

Specifics include Windows Mobile 6 and we'll be using the .Net CF. Also some of the text based fields will be anywhere between 35 and 500 characters. The device will operate in two different methods, batch and WiFi. Of course for WiFi we can just submit requests to a full blown DB engine and just fetch results back. This question centres around the "batch" version which will house a database loaded with information on the devices flash/removable storage card.

At any rate, I know SQLCE has some basic indexing but you don't get into the real fancy "full text" style indexes until you've got the full blown version which of course isn't available on a mobile platform.

An example of what the data would look like:

"apron carpenter adjustable leather container pocket waist hardware belt" etc. etc.

I haven't gotten into the evaluation of any other specific options yet as I figure I'd leverage the experience of this group in order to first point me down some specific avenues.

Any suggestions/tips?

+2  A: 

You could try Lucene.Net. I'm not sure how well it's suited to mobile devices, but it is billed as a "high-performance, full-featured text search engine library".

http://incubator.apache.org/lucene.net/ http://lucene.apache.org/java/docs/

wizlb
Thanks for the suggestion. I'll have a look at it and see what the port to C# entails. My only concern is it looks inactive now as a the project had it's last major release in April 2007.
Mat Nadrofsky
This site (stackoverflow.com) actually uses Lucene.Net. If you look around (maybe in their podcasts), you might be able to find their notes on how they used it.
wizlb
@wizlb - I believe lucene.net is planned, but not yet implemented http://blog.stackoverflow.com/2008/11/sql-2008-full-text-search-problems/
converter42
A: 

Anyone else have any ideas on this one?

Mat Nadrofsky
May as well just say, "bump" ; )
Gordon Wilson
+3  A: 

Just recently I had the same issue. Here is what I did:

I created a class to hold just an id and the text for each object (in my case I called it a sku (item number) and a description). This creates a smaller object that uses less memory since it is only used for searching. I'll still grab the full-blown objects from the database after I find matches.

public class SmallItem
{
    private int _sku;
    public int Sku
    {
        get { return _sku; }
        set { _sku = value; }
    }

    // Size of max description size + 1 for null terminator.
    private char[] _description = new char[36];
    public char[] Description
    {
        get { return _description; }
        set { _description = value; }
    }

    public SmallItem()
    {
    }
}

After this class is created, you can then create an array (I actually used a List in my case) of these objects and use it for searching throughout your application. The initialization of this list takes a bit of time, but you only need to worry about this at start up. Basically just run a query on your database and grab the data you need to create this list.

Once you have a list, you can quickly go through it searching for any words you want. Since it's a contains, it must also find words within words (e.g. drill would return drill, drillbit, drills etc.). To do this, we wrote a home-grown, unmanaged c# contains function. It takes in a string array of words (so you can search for more than one word... we use it for "AND" searches... the description must contain all words passed in... "OR" is not currently supported in this example). As it searches through the list of words it builds a list of IDs, which are then passed back to the calling function. Once you have a list of IDs, you can easily run a fast query in your database to return the full-blown objects based on a fast indexed ID number. I should mention that we also limit the maximum number of results returned. This could be taken out. It's just handy if someone types in something like "e" as their search term. That's going to return a lot of results.

Here's the example of custom Contains function:

public static int[] Contains(string[] descriptionTerms, int maxResults, List<SmallItem> itemList)
{
    // Don't allow more than the maximum allowable results constant.            
    int[] matchingSkus = new int[maxResults];

    // Indexes and counters.
    int matchNumber = 0;
    int currentWord = 0;
    int totalWords = descriptionTerms.Count() - 1;  // - 1 because it will be used with 0 based array indexes

    bool matchedWord;

    try
    {   
        /* Character array of character arrays. Each array is a word we want to match.
         * We need the + 1 because totalWords had - 1 (We are setting a size/length here,
         * so it is not 0 based... we used - 1 on totalWords because it is used for 0
         * based index referencing.)
         * */
        char[][] allWordsToMatch = new char[totalWords + 1][];

        // Character array to hold the current word to match. 
        char[] wordToMatch = new char[36]; // Max allowable word size + null terminator... I just picked 36 to be consistent with max description size.

        // Loop through the original string array or words to match and create the character arrays. 
        for (currentWord = 0; currentWord <= totalWords; currentWord++)
        {
            char[] desc = new char[descriptionTerms[currentWord].Length + 1];
            Array.Copy(descriptionTerms[currentWord].ToUpper().ToCharArray(), desc, descriptionTerms[currentWord].Length);
            allWordsToMatch[currentWord] = desc;
        }

        // Offsets for description and filter(word to match) pointers.
        int descriptionOffset = 0, filterOffset = 0;

        // Loop through the list of items trying to find matching words.
        foreach (SmallItem i in itemList)
        {
            // If we have reached our maximum allowable matches, we should stop searching and just return the results.
            if (matchNumber == maxResults)
                break;

            // Loop through the "words to match" filter list.
            for (currentWord = 0; currentWord <= totalWords; currentWord++)
            {
                // Reset our match flag and current word to match.
                matchedWord = false;
                wordToMatch = allWordsToMatch[currentWord];

                // Delving into unmanaged code for SCREAMING performance ;)
                unsafe
                {
                    // Pointer to the description of the current item on the list (starting at first char).
                    fixed (char* pdesc = &i.Description[0])
                    {
                        // Pointer to the current word we are trying to match (starting at first char).
                        fixed (char* pfilter = &wordToMatch[0])
                        {
                            // Reset the description offset.
                            descriptionOffset = 0;

                            // Continue our search on the current word until we hit a null terminator for the char array.
                            while (*(pdesc + descriptionOffset) != '\0')
                            {
                                // We've matched the first character of the word we're trying to match.
                                if (*(pdesc + descriptionOffset) == *pfilter)
                                {
                                    // Reset the filter offset.
                                            filterOffset = 0;

                                    /* Keep moving the offsets together while we have consecutive character matches. Once we hit a non-match
                                     * or a null terminator, we need to jump out of this loop.
                                     * */
                                    while (*(pfilter + filterOffset) != '\0' && *(pfilter + filterOffset) == *(pdesc + descriptionOffset))
                                    {
                                        // Increase the offsets together to the next character.
                                        ++filterOffset;
                                        ++descriptionOffset;
                                    }

                                    // We hit matches all the way to the null terminator. The entire word was a match.
                                    if (*(pfilter + filterOffset) == '\0')
                                    {
                                        // If our current word matched is the last word on the match list, we have matched all words.
                                        if (currentWord == totalWords)
                                        {
                                            // Add the sku as a match.
                                            matchingSkus[matchNumber] = i.Sku.ToString();
                                            matchNumber++;

                                            /* Break out of this item description. We have matched all needed words and can move to
                                             * the next item.
                                             * */
                                            break;
                                        }

                                        /* We've matched a word, but still have more words left in our list of words to match.
                                         * Set our match flag to true, which will mean we continue continue to search for the
                                         * next word on the list.
                                         * */
                                         matchedWord = true;
                                    }
                                }

                                // No match on the current character. Move to next one.
                                descriptionOffset++;
                            }

                            /* The current word had no match, so no sense in looking for the rest of the words. Break to the
                             * next item description.
                             * */
                             if (!matchedWord)
                                break;
                        }
                    }
                }
            }
        };

        // We have our list of matching skus. We'll resize the array and pass it back.
        Array.Resize(ref matchingSkus, matchNumber);
        return matchingSkus;
    }
    catch (Exception ex)
    {
        // Handle the exception
    }
}

Once you have the list of matching skus, you can iterate through the array and build a query command that only returns the matching skus.

For an idea of performance, here's what we have found (doing the following steps):

  • Search ~171,000 items
  • Create list of all matching items
  • Query the database, returning only the matching items
  • Build full-blown items (similar to SmallItem class, but a lot more fields)
  • Populate a datagrid with the full-blow item objects.

On our mobile units, the entire process takes 2-4 seconds (takes 2 if we hit our match limit before we have searched all items... takes 4 seconds if we have to scan every item).

I've also tried doing this without unmanaged code and using String.IndexOf (and tried String.Contains... had same performance as IndexOf as it should). That way was much slower... about 25 seconds.

I've also tried using a StreamReader and a file containing lines of [Sku Number]|[Description]. The code was similar to the unmanaged code example. This way took about 15 seconds for an entire scan. Not too bad for speed, but not great. The file and StreamReader method has one advantage over the way I showed you though. The file can be created ahead of time. The way I showed you requires the memory and the initial time to load the List when the application starts up. For our 171,000 items, this takes about 2 minutes. If you can afford to wait for that initial load each time the app starts up (which can be done on a separate thread of course), then searching this way is the fastest way (that I've found at least).

Hope that helps.

PS - Thanks to Dolch for helping with some of the unmanaged code.

Jason Down