views:

567

answers:

3

Hi,

I've got a text file, with about 200,000 lines. Each line represents an object with multiple properties. I only search through one of the properties (the unique ID) of the objects. If the unique ID I'm looking for is the same as the current object's unique ID, I'm gonna read the rest of the object's values.

Right now, each time I search for an object, I just read the whole text file line by line, create an object for each line and see if it's the object I'm looking for - which is basically the most inefficient way to do the search. I would like to read all those objects into memory, so I can later search through them more efficiently.

The question is, what's the most efficient way to perform such a search? Is a 200,000-entries NSArray a good way to do this (I doubt it)? How about an NSSet? With an NSSet, is it possible to only search for one property of the objects?

Thanks for any help!

-- Ry

+3  A: 

Use NSDictionary to map from ID's to objects. That is: use the ID as key and the object as value. NSDictionary is the only collection class which supports efficient key lookup. (Or key lookup at all)

Dictionaries are a different kind of collection than the other collection classes. It is an associative collection (maps IDs to objects in your case) whereas the others are simply containers for multiple objects. NSSet holds unordered unique objects and NSArray holds ordered objects (may hold duplicates).

UPDATE:

To avoid reallocations as you read the entries, use the dictionaryWithCapacity: method. If you know the (approximate) number of entries prior to reading them you can use it to preallocate a big enough dictionary.

yngvedh
Thanks, but the more entries I add into the NSDictionary, the slower it gets to add more entries, and searching the entries gets much slower, too. Adding an entry into my NSDictionary of 50,000 entries takes almost one second. That approach is not suitable to create a 200,000-entries NSDictionary.
@ryyst, It takes time to add 200,000 entries to a NSDictionary. If for example it is implemented as a hash table which relocates the table as you add elements you are looking at at least O(n log n) to add those elements. I also suspect that reading and parsing entries from a file takes far longer than actually adding them to the NSDictionary. Have you timed the reading and insertion operations separately?
yngvedh
+2  A: 

@yngvedh is correct in that an NSDictionary has O(1) lookup time (as is expected for a map structure). However, after doing some testing, you can see that NSSet also has O(1) lookup time. Here's the basic test I did to come up with that: http://pastie.org/933070

Basically, I create 1,000,000 strings, then time how long it takes me to retrieve 100,000 random ones from both the dictionary and the set. When I run this a few times, the set actually appears to be faster...

dict lookup: 0.174897
set lookup: 0.166058
---------------------
dict lookup: 0.171486
set lookup: 0.165325
---------------------
dict lookup: 0.170934
set lookup: 0.164638
---------------------
dict lookup: 0.172619
set lookup: 0.172966

In your particular case, I'm not sure either of these will be what you want. You say that you want all of these objects in memory, but do you really need them all, or do you just need a few of them? If it's the latter, then I would probably read through the file and create an object ID to file offset mapping (ie, remember where each object id is in the file). Then you could look up which ones you want and use the file offset to jump to the right spot in the file, parse that line, and move on. This is a job for NSFileHandle.

Dave DeLong
+2  A: 

200,000 objects sounds like you might run into memory constraints, depending on size of the objects and your target environment. One other thing you may want to consider is to convert the data into SQLite database, and then index the columns you want to do lookup on. This would provide a good compromise between efficiency and resource consumption, as you would not have to load the full set into memory.

Jaanus
Core data makes it brutally easy to at least see if this solution is fast enough/cheap enough.
TofuBeer
Yeah, I was originally thinking of raw SQLite, but CD is even easier.
Jaanus
Thanks! I think I'm going to try to use CoreData.