+3  A: 

This and other similar questions come up often when talking about transitioning from a traditional RDB to a BigTable-like datastore like App Engine's.

It's often useful to discuss why the datastore doesn't support unique keys, since it informs the mindset you should be in when thinking about your data storage schemes. The reason unique constraints are not available is because it greatly limits scalability. Like you've said, enforcing the constraint means checking all other entities for that property. Whether you do it manually in your code or the datastore does it automatically behind the scenes, it still needs to happen, and that means lower performance. Some optimizations can be made, but it still needs to happen in one way or another.

The answer to your question is, really think about why you need that unique constraint.

Secondly, remember that keys do exist in the datastore, and are a great way of enforcing a simple unique constraint.

my_user = MyUser(key_name=users.get_current_user().email())
my_user.put()

This will guarantee that no MyUser will ever be created with that email ever again, and you can also quickly retrieve the MyUser with that email:

my_user = MyUser.get(users.get_current_user().email())

In the python runtime you can also do:

my_user = MyUser.get_or_create(key_name=users.get_current_user().email())

Which will insert or retrieve the user with that email.

Anything more complex than that will not be scalable though. So really think about whether you need that property to be globally unique, or if there are ways you can remove the need for that unique constraint. Often times you'll find with some small workarounds you didn't need that property to be unique after all.

Jason Hall
Thanks for the detailed writeup. I partly understand why the GAE datastore does not have the features of a regular RDB. An app I am working currently requires incremental serial numbers to be assigned to entries. Currently, I check the db for the last entry number(sNo), add 1(sNo++) to it and then insert a new entry with the new sNo as value, so that others working on the system, do not get duplicate sNo's to work with. This has not happened, but I fear that in times of heavy work(when 100-120 employees are adding entries), a duplicate sNo may arise. Updating question with an example.
abel
+1. Excellent answer!
Nick Johnson
@abel Ideally your system would not require monotonically increasing serial numbers and you could assign a random key to each item. If this is to interact with another system however this may not be easily doable. You can try using transactions for atomicity and memcache for performance.
Jason Hall
@Jason Hall The unique serial numbers are also unique offline 8 digit keys which stay unique for the span of the company. However I must add that I do not have any experience with 'transactions' and memcache. I am fairly new to JAVA too and am of a hackish php descent(everybody looks down on us!)
abel
+1  A: 

You can generate unique serial numbers for your products without needing to enforce unique IDs or querying the entire set of entities to find out what the largest serial number currently is. You can use transactions and a singleton entity to generate the 'next' serial number. Because the operation occurs inside a transaction, you can be sure that no two products will ever get the same serial number.

This approach will, however, be a potential performance chokepoint and limit your application's scalability. If it is the case that the creation of new serial numbers does not happen so often that you get contention, it may work for you.

EDIT: To clarify, the singleton that holds the current -- or next -- serial number that is to be assigned is completely independent of any entities that actually have serial numbers assigned to them. They do not need to be all be a part of an entity group. You could have entities from multiple models using the same mechanism to get a new, unique serial number.

I don't remember Java well enough to provide sample code, and my Python example might be meaningless to you, but here's pseudo-code to illustrate the idea:

  1. Receive request to create a new inventory item.
  2. Enter transaction.
  3. Retrieve current value of the single entity of the SerialNumber model.
  4. Increment value and write it to the database
  5. Return value as you exit transaction.

Now, the code that does all the work of actually creating the inventory item and storing it along with its new serial number DOES NOT need to run in a transaction.

Caveat: as I stated above, this could be a major performance bottleneck, as only one serial number can be created at any one time. However, it does provide you with the certainty that the serial number that you just generated is unique and not in-use.

Adam Crossland
Wouldn't a transaction on a singleton entity require that singleton and all related entities to be in the same entity group?
Jason Hall
@Jason, I think that the Singleton can be isolated in its own group. The entities that are receiving the serial numbers are unrelated to it. The only thing that needs to happen in the isolation of a transaction is the incrementing o the serial number. I'd gladly code up an example, but I program AppEngine with Python. Haven't touched Java since 1999.
Adam Crossland
@Jason Hall @Adam Crossland I must tell you that much of what you said went 'woosh' above my head. I have no 'transactions' experience, am new to JAVA, but I do understand a bit of the pseudo code, which is similar to what I currently do. It will work in a low traffic scenario, but I want to be relatively sure in bad scenarios too. Thanks for the excellent writeup!
abel