views:

516

answers:

7

I'm performing a search of a large plist file which contains dictionaries, tens of thousands of them, each with 2 key/string pairs. My search algorithms goes through the dictionaries, and when it finds a text match in either of the strings in the dictionary, the contents of the dictionary are inserted. Here is how it works:

NSDictionary *eachEntry;
NSArray *rawGlossaryArray = [[NSArray alloc] initWithContentsOfFile:thePath]; // this contains the contents of the plist

for (eachEntry in rawGlossaryArray)
    {
     GlossaryEntry *anEntry = [[GlossaryEntry alloc] initWithDictionary:eachEntry];


     NSRange titleResultsRange = [anEntry.title rangeOfString:filterString options:NSCaseInsensitiveSearch];
     NSRange defResultsRange = [anEntry.definition rangeOfString:filterString options:NSCaseInsensitiveSearch];

     if (titleResultsRange.length > 0 || defResultsRange.length > 0) {
      // store that item in the glossary dictionary with the name as the key
      [glossaryDictionary setObject:anEntry forKey:anEntry.title];

     }
     [anEntry release];
    }

Each time a search is performed, there is a delay of around 3-4 seconds in my iPhone app (on the device at least; everything runs pretty quickly in the simulator). Can anyone advise on how I might optimize this search?

+1  A: 

A few suggestions:

  1. You're doing a lot of allocing and releasing in that loop. Could you create a single GlossaryEntry before the loop, then just reload it's contents inside the loop? This would avoid a bunch of alloc/releases.

  2. Rather than loading the file each time, could you lazy load it once and keep it cached in memory (maybe in a singleton type object)? Generally this isn't a good idea on the iPhone, but you could have some code in your "didReceiveMemoryWarning" handler that would free the cache if it became an issue.

Eric Petroelje
Thanks for your reply Eric. I tried both suggestions but without any noticeable improvement on either count. Just to clarify that the plist file is loaded once (e.g. after each new search character is entered, not in each iteration of the loop). But making a singleton instance didn't help either way, it doesn't appear to be the bottleneck here.Thanks again.
moigno
Not sure from your comment if maybe you misunderstood what I meant about loading the file each time. I didn't mean loading it in the loop, I meant loading it when they do a search. Load the file once (say at app startup) then your search function would just use that pre-loaded data right from memory.
Eric Petroelje
You're quite right, I'm now loading the shared instance of the file in the initialisation of the search class... there seems to be a small performance increase but its still pretty slow unfortunately. Thanks anyway.
moigno
+1  A: 

You should run your application is Instruments, and see what the bottleneck really is. Performance optimizations in the blind are really difficult, and we have tools to make them clear, and the tools are good too!

There's also the possibility that this isn't optimizable. I'm not sure if it's actually hanging the UI in your app or just taking a long time. If it's blocking the UI you need to get out of the main thread to do this work. Same with any significant work to keep an app responsive.

Nick Veys
+2  A: 

Without looking at the data set I can't be sure, but if you profile it you are spending the vast percentage of your time in -rangeOfString:options:. If that is the case you will not be able to improve performance without fundamentally changing the data structure you are using to store your data.

You might want to construct some sort trie with strings and substrings pointing to the objects. It is much more complicated thing to setup, and insertions into it will be more expensive, but lookup would be very fast. Given that you are serializing out the structure anyway expensive inserts should not be much of an issue.

Louis Gerbarg
A: 

try the following, and see if you get any improvement:

1) use

- (NSRange)rangeOfString:(NSString *)aString options:(NSStringCompareOptions)mask

and as mask, pass the value NSLiteralSearch. This may speedup search considerably as described in the Apple documentation (String Programming Guide for Cocoa):

NSLiteralSearch Performs a byte-for-byte comparison. Differing literal sequences (such as composed character sequences) that would otherwise be considered equivalent are considered not to match. Using this option can speed some operations dramatically.

2) From the documentation (String Programming Guide for Cocoa):

If you simply want to determine whether a string contains a given pattern, you can use a predicate:

BOOL match = [myPredicate evaluateWithObject:myString];

For more about predicates, see Predicate Programming Guide.

unforgiven
+2  A: 

That just cries out for using a database, that you pre-populate and put into the application.

Kendall Helmstetter Gelner
+1 - can't believe I didn't think of this. Quite a "duh" moment :)
Eric Petroelje
A: 

You should profile it in instruments to find where the bottleneck actually is. If I had to guess, I would say the bottleneck would be [[NSArray alloc] initWithContentsOfFile:thePath].

Having said that, you'd probably get the best performance by storing the data in an sqlite database (which you would search with SQL) instead of using a plist.

Tom Dalling
+1  A: 

You're probably getting the best performance you're likely to get, given your current data structures. You need to change how you're accessing the data, in order to get better performance.

Suggestions, in no particular order:

  1. Don't create your GlossaryEntry objects in a loop while you're filtering them. Rather than storing the data in a Property List, just archive your array of GlossaryEntry objects. See the NSCoding documentation.

  2. Rather than searching through tens of thousands of strings at every keystroke, generate an index of common substrings (maybe 2 or 3 letters), and create an NSDictionary that maps from that common substring to the set of results to use as an index. You can create the index at build time, rather than at run-time. If you can slice up your data set into several smaller pieces, the linear search for matching strings will be considerably faster.

  3. Store your data in an SQLite database, and use SQL to query it - probably overkill for just this problem, but allows for more sophisticated searches in the future, if you'll need them.

  4. If creating a simple index doesn't work well enough, you'll need to create a search tree style data structure.

Mark Bessey