views:

879

answers:

4

I have an sqlite database in my iPhone app that I access via the Core Data framework. I'm using NSPredicates to query the database.

I am building a search function that needs to search six different varchar fields that contain text. At the moment, it's very slow and I need to improve performance, probably in the sqlite database. Would it be best to create an index on all those columns? Or would it be better to build a custom index table that expands those six columns into multiple rows, each containing a word and the ID it matches? Any other suggestions?

Thanks in advance.

+2  A: 

There are things you can do to improve the performance of searching for text in sqlite databases. Although Core Data abstracts you away from the underlying store it can be good to have an appreciation of what is going on when your store is backed using sqlite.

If we assume you're doing a substring search of these fields there are things you can do to improve search performance. Apple recommend using a derived properties. This amounts to maintaining a normalised version of your property in your model that is used for searching. The derived property should be done in a way that it can be indexed. You then express your search in terms of this derived property using binary operators > <= etc.

I found doing this reduced our search from around 1 second to under 100ms.

To make things clear I would suggest looking at the ADC example http://developer.apple.com/mac/library/samplecode/DerivedProperty/

lyonanderson
I'll check out that example tonight. I'll need to read up on derived properties, too. Thanks for the tip.
mswebersd
A: 

I assume these columns store text. The question is how much text and how often this model is accessed. If it is a large amount of text, I would create other properties that held the text, stripping common words and Unicode text. The only downside to this is that you end up with extra properties to maintain. You can do any indexing to improve perf on those columns.

Scott Densmore
Yes, they do store text... I've updated my question to reflect that. I have about 500 rows, and each of the six columns has an average of five words in them. So in total, I'm searching roughly 3000 words. I'm just not clear on how indexing works in sqlite. I'll need to read up on that.
mswebersd
@mswebersd: If your db has only 500 rows I'm curious what could be happening that you're calling it very slow. Regardless of whether it's indexed or not queries on table of that size should be near instantaneous. Exactly what SQL are your executing?
Herbert Sitz
((name contains[cd] %@) OR (color contains[cd] %@) OR (...)) AND ((name contains[cd] %@) OR (color contains[cd] %@) OR (...)) AND (...) Tons of ORs and contains[cd].
mswebersd
It uses a UISearchBar on a table and the search runs whenever the user types a letter and if the string > 0 in length.
mswebersd
Oh one more thing... I forgot to mention that each OR phrase is for a single word (I split on the space character). So the ANDs join multiple words.
mswebersd
@mswebersd: It looks to me like what you want is full text indexing with sqlite's ft3 module. It would simplify the sql and be indexed and fast to boot. Have you looked at that yet? Here's the main sqlite ft3 page: http://www.sqlite.org/fts3.html
Herbert Sitz
A: 

If what you want is essentially full text indexing of your sqlite db, then you may want to use sqlite's ft3 module, since that's exactly what it provides: http://www.sqlite.org/cvstrac/wiki?p=FtsUsage http://dotnetperls.com/sqlite-fts3

Herbert Sitz
Does CoreData support FTS3? This is probably why I should have left CoreData in the title of this post. I'll look into it though.
mswebersd
+1  A: 

From the Core Data Programming Guide:

How you use predicates can significantly affect the performance of your application. If a fetch request requires a compound predicate, you can make the fetch more efficient by ensuring that the most restrictive predicate is the first, especially if the predicate involves text matching (contains, endsWith, like, and matches) since correct Unicode searching is slow. If the predicate combines textual and non-textual comparisons, then it is likely to be more efficient to specify the non-textual predicates first, for example (salary > 5000000) AND (lastName LIKE 'Quincey') is better than (lastName LIKE 'Quincey') AND (salary > 5000000).

If there is a way to reorder your query such that the simplest logic is on the left, and the most complex on the right, that can help your search performance. As Lyon suggests, searching Unicode text is extremely expensive, so Apple recommends searching against derived values that strip unicode characters and common phrases like a, and, and the.

Brad Larson
I might be able to pull out common words, but probably not. I don't think users would ever enter numbers. My query string is extremely inefficient, using something like this:((name contains[cd] %@) OR (color contains[cd] %@) OR (...)) AND ((name contains[cd] %@) OR (color contains[cd] %@) OR (...)) AND (...)Tons of ORs. If sqlite is anything like other databases I've used, that's a horribly slow way to search. Honestly, maybe the solution is simpler than all of this. Maybe I just need to only respond when the user clicks "Search" rather than every time they type a letter?
mswebersd
You still might be able to get away with searching against a derived property that has the unicode characters stripped out. That alone should give you a significant performance boost.
Brad Larson