Hello, I have a graph with one kind of object called Message. A message could have childs so the graph is a tree. Each object on the graph contains an attribute value; that's a sort of UUID (globally unique) so the store cannot contains multiple messages with the same UUID (this uuid is a string taken from message data so I can't replace it using the objectID of nsmanagedobject). The problem happend when I need to validate object insertion into the store. How can I check if a message with the same UUID is already present into the store? I'm thinking about a fetch request on validateForInsert: event but it seems too slow/complex when there are lots of objects to insert (about 30k in my case). Anyone have a best solution?
This is a known limitation with CoreData (I've filed a feature request on this myself). You should also go to http://bugreport.apple.com and let them know you want this feature. (The original bug id is rdar://3711805)
The way I've gotten around this in the past is to use a convenience method to access and create my NSManagedObjects
. This convenience method then looks up in a static NSMutableDictionary
to see if another object with the same unique attribute has already been created (the unique attribute being the key, and the managedObjectID being the value). If it finds one, it just returns that object instead. If it doesn't, then it goes about creating one and caching that object's ID into the static dictionary for future use. When the app first launches, I have to pre-populate this dictionary with the attributes/identifiers of pre-existing objects.
It's a pain, I know. :( File an enhancement request.
Okay with a second index results are better, so I try to make a summary of the problem and the solution. If anyone have a better idea I'll be happy to talk about it :) I've about 30,000 messages taken from network and I need to save all into a Core Data Store in form of a tree. Each message contains an unique identification string and no more than one message can be saved on database with the same id. CoreData at this time does not support uniqueness of attributes and I can't use objectID property to ensure this kind of thing. A first solution is, in pseudo code: - Execute a query to see if uuid string is presents in storage - If it's not present I can make a new NSManagedObject with that uuid and put it into storage, otherwise I'll ignore it (it's already on db) - Execute another query to find the direct parent of this new message, if found I'll link both messages, if not it's a root message
This solution has a big problem. With 30k messages I need of 30k query to check if the new message exist on coredata, another 30k to check for parent (plus, I think, another 30k to insert the new object). Over 60k+ queries takes lots of time (a minute or more here).
My second solution is to create a second auxiliary NSMutableDictionary where i'll save message uuid as key and NSManagedObjectID's URI rapresentation (the only I can save to NSData) as value for dictionary entry. The result in pseudo code is: - Use objectForKey:uuid to my auxiliary dictionary to see if the message exist in coredata - If yes I'll ignore it. If not i'll put it into the store - Use objectForKey:parentuuid to my auxiliary dictionary to see if the parent of the message is present on coredata. If yes i'll use NSPersistentCoordinator's managedObjectIDForURIRepresentation: to get the NSManagedObject (the parent of the message) and link both parent and child
With this solution the entire process takes about 5 seconds to finish (the result dictionary it's around 2mb).
I've uploaded an example project with both two techniques. Using coredata+indexed attributed takes about 4 minutes to save (what's wrong?!!) With an auxiliary index it takes about 3 seconds to save. Feel free to comment it. It's very strange, especially after I've read it: http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html this is my code project: http://dl.dropbox.com/u/103260/CoreDataTreeTest2.zip
A temp solution (thanks to Roland from Cocoa-dev), if anyone wanna use CoreData, is to save the context each X insertion. In my case using ([ctx save:]) each 500/1000 insertion drop down time from minutes to seconds (another project that implement this solution is available here: http://dl.dropbox.com/u/103260/CoreDataTreeTest3.zip)
These are my benchmarks with 30.000 objects:
- CoreData without saving each X insertions: about 5-6 minutes
- CoreData with saving each 500 insertions: about 30 seconds
- CoreData with auxiliary indexes dictionary: about 2 seconds
However that's seems to be strange.
According to: http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html
it should be faster than that, and 30k objects are few objects for CoreData. I would to try to fill a bug at bugreporter and listen what Apple Engineers says.