views:

326

answers:

2

Trying to import a RSS feed into Core Data. Once they are imported, when trying to update the feed again afterwards, how do I most efficiently prevent duplicates. Right now it checks every item against the datastore during the parsing, which is not very efficient.

I looked into the Top Songs sample from Apple. It uses a least recently used cache for categories. But when every item is different the cache doesn't help at all.

EDIT: To clarify, I can already identify each item uniquely in the feed with guid. The issue is the performance of comparing hundreds of items against the database every time, when most of them are duplicates.

A: 

Can you modify your core data model ?

If you can I would add a "Hash" property to each feed entry to uniquely identify it. Then you could efficiently detect wether a specific entry is already in your database or not.

Charter
Not sure what you mean. I still need to check the hash in the database. What's the difference of checking the hash and the checking the item guid, which is what I'm doing right now. I can already identify each item uniquely with guid. The issue is the time it takes to compare the item in Core Data.
willi
I was not aware you already had a guid. If performance is the real problem then you should probably use SQLlite directly instead.Could you explain why you choose core data instead ?
Charter
Core Data is always a better answer than using raw sqlite. A quick search on SO will show many discussions on the subject. sqlite is *not* more performant than raw sqlite.
Marcus S. Zarra
Then Brent Simmons from newsgator technology is just wrong ? [On switching away from Core Data](http://inessential.com/2010/02/26/on_switching_away_from_core_data)
Charter
Yes, he is wrong 99.999% of the time as he even stated in his own blog. It is still on my `TODO` list to see if there is a solution for that other 0.001% as I suspect there is a solution for it.
Marcus S. Zarra
Ok, then I'll definitely look more deeply into Core Data. Thx for the info.
Charter
+4  A: 

When you are importing a new row you can run a query against the existing rows to see if it is already in place. To do this you create a NSFetchRequest against your entity, set the predicate to look for the guid property and set the max rows returned to 1.

I would recommend keeping this NSFetchRequest around during your import so that you can reuse it while going through the import. If the NSFetchRequest returns a row you can update that row. If it does not return a row then you can insert a new row.

When done correctly you will find the performance more than acceptable.

Marcus S. Zarra
Thanks. I'll try this out. Your books are great btw.
willi