I'm using Core Data to store a lot (1000s) of items. A pair of properties on each item are used to determine uniqueness, so when a new item comes in, I compare it against the existing items before inserting it. Since the incoming data is in the form of an RSS feed, there are often many duplicates, and the cost of the uniquing step is O(N^2), which has become significant.
Right now, I create a set of existing items before iterating over the list of (possible) new items. My theory is that on the first iteration, all the items will be faulted in, and assuming we aren't pressed for memory, most of those items will remain resident over the course of the iteration.
I see my options thusly:
- Use string comparison for uniquing, iterating over all "new" items and comparing to all existing items (Current approach)
- Use a predicate to filter the set of existing items against the properties of the "new" items.
- Use a predicate with Core Data to determine uniqueness of each "new" item (without retrieving the set of existing items).
Is option 3 likely to be faster than my current approach? Do you know of a better way?