views:

503

answers:

3

I have a Message entity that has a messageID property. I'd like to ensure that there's only ever one instance of a Message entity with a given messageID. In SQL, I'd just add a unique constraint to the messageID column, but I don't know how to do this with Core Data. I don't believe it can be done in the data model itself, so how do you go about it?

My initial thought is to use a validation method to do a fetch on the NSManagedObject's context for the ID, see if it finds anything but itself, and if so, fails the validation. I suspect this will work - but I'm worried about the performance of something like that. I went through a lot of effort to minimize the fetch requests needed for the entire import routine, and having it validate by performing a fetch for every single new message entity seems a bit excessive. I can get all pre-existing objects I need and identify all the new objects I need to insert into the store using just two fetch queries before I do the actual work of importing and connecting everything together. This would add a fetch to every single update or insert in addition to those two - which would seem to eliminate any performance advantage I had by pre-processing the import data in the first place!

The main reason this is an issue is that the importer can (potentially) run several batches concurrently on several threads and may include some overlapping/duplicate data that needs to ultimately result in just one object in the store and not duplicate entries. Is there a reasonable way to do this and does what I'm asking for make sense for Core Data?

+1  A: 

I don't think there is a way to easily guarantee an attribute is unique without doing a lot of work on your own. You can, of course use CFUUIDCreate to create a globally unique UUID, which should be unique, even in a multithreaded environment. But...

The objectID (type NSManagedObjectID) of all managed objects is guaranteed to be unique within the persistent store coordinator. Since you can add arbitrarily many persistent stores to the coordinator, this guarantee basically guarantees that the objectIDs are globally unique. Why don't you use the objectID as your messageID? You can't, of course, change the objectID once it's assigned (and it won't get assigned until the context containing the inserted object is saved; until then it will be a temporary but still unique ID).

Barry Wark
The messageID is assigned by a server to the message. The input data includes relationships between messages and it's transmitted via lists of messageIDs. I use that info to lookup-or-create message objects as needed so I can connect them directly to each other. Sometimes I get the same message in multiple ways/contexts from the server, so I need to be sure I don't have duplicate instances of the same unique message or else I'll get "clouds" of inter-connected message objects. (Hope that makes sense...) I think I really need to use the server's messageID as the "one true ID" in this case. :/
Sean
+3  A: 

The only way to guarantee uniqueness is to do a fetch. Fortunately you can just do a -countForFetchRequest:error: and check to see if it is zero or not. That is the least expensive way to guarantee uniqueness at this time.

You can probably accomplish this in the validation or run it in the loop that is processing the data. Personally I would do it above the creation of the NSManagedObject so that you do not have the unnecessary allocs when a record already exists.

Marcus S. Zarra
Is a fetch even safe for certain? How can I be sure that another thread's context isn't saving at the moment between the fetch and the creation of the new object?
Sean
A fetch is safe because each thread has its own context and they will lock the underlying NSPersistentStoreCoordinator so that there are no collisions.
Marcus S. Zarra
Isn't there a gap in time between the completion of the fetch and the creation of the new object and placing it into the context? What I mean is, couldn't another thread (with it's own context), have been waiting to do a fetch while the first thread finishes and starts making a new object? The store's lock is released after the first thread's fetch is done, so the second thread starts fetching and also finds it needs to create the object. Eventually both contexts are uneventfully saved to the store. Now there's two objects in the store with the same ID, right?
Sean
I think what Marcus is saying here is that you can do a fetch in a thread-safe way during your validate routine when it's called during a save, because the context(s) lock the persistent store coordinator during saves. [Note: I'm not asserting this myself, I don't know for sure. But it seems likely from what he said, and what I know about core data.] Locking the persistent store coordinator is what allows atomic "transactions" in core data.
dodgio
See Apple's Core Data Programming Guide / Multi-Threading with Core Data / General Guidelines / #1"If you want to aggregate a number of operations in one context together as if a virtual single transaction, you can lock the persistent store coordinator to prevent other managed object contexts using the persistent store coordinator over the scope of several operations."
dodgio
The creation of objects is at the context level so even if the PSC is locked, it won't block the creation of an object. Only fetches and saves would be blocked against each other since they both hit the PSC.
Marcus S. Zarra
+1  A: 

So you have a NSManagedContext for each thread, backed by the same persistent store, is that correct? And before you save the NSManagedContext, you'd like to make sure the messageID is unique, that is, that you are not updating an existing row, and that it is not in one of the other contexts, correct?

Given that model (correct me if I misunderstand), I think you'd be better served having one object that manages access to the persistent store. That way, all threads would update one context and you can do your validation in there, using Marcus's -countForFetchRequest:error: suggestion. Granted, that places a bottleneck on this operation.

Don
Yes, that is a correct description. I was afraid that might be the case and I'd end up having to restrict everything to a serial queue, basically. Boo. :(
Sean
Well, just because I'd do it that way doesn't mean someone more brilliant won't come along and provide a better option. I'm hoping they do.
Don
You should NEVER have more than one thread hitting the same context. That is a recipe for failure. Each thread needs to have its own context hitting the same NSPersistentStoreCoordinator and let Core Data do the locking. Trust me, they have it right and you can trust that there are going to be no collisions if you follow their rules on threading.
Marcus S. Zarra
I trust you that they have it right. I thought the "serial queue" model would be better given the validation he is doing, but I defer to your expertise.
Don