views:

149

answers:

6

Let's say we have:

class User(db.Model):
  nickname = db.StringProperty()

and we have 500k entities in User, each with a unique nickname.

and I now want to add one more entity, and it must be a unique nickname. So I run this:

to_check = User.gql("WHERE nickname = :1",new_nickname).get()
if to_check is None:
  # proceed to create entity
  User(nickname=new_nickname).put()

is this method going to work for over 500k users? Am I going to experience slow processing times?

what are the optimization methods for this?

PS: is indexing the nickname property a good way to proceed?

I can only think of this at the moment:

class User(db.Model):
  nickname = db.StringProperty(indexed=True) # index this property

EDITED: btw, I have two unique properties I want to maintain: userid and nickname. The userid will be automatically assigned as the keyname ( I'm making a facebook app which takes the user's facebook id and creates a user entity)

So to me, userid is more important so I'll use it as the keyname.

The nickname will be manually entered by the facebook user, so I need a mechanism to check whether it is unique or not.

so the problem now is, what do I do with the nickname? I can't have two keynames :(

+1  A: 

get_by_key_name will be your new best friend.

I frequently use a code pattern like the following:

user = User.get_by_key_name(user_key_name)
if not user:
  user = User(key_name = user_key_name)

This tends to be much faster than a GQL query.

If you are going to be writing more than one entity to the datastore at a time, you should also use the pattern of db.put(entities_list) where the list can contain up to 500 entities of any kind - they don't even have to be the same model kind.

jamtoday
but I sort of need the keyname for each entity to be the userid. so basically the userid AND the nicknames must be unique. userid is the key name. the nickname is just something extra that is also unique.
fooyee
If this is indeed the case, then a GQL query makes sense. As long as this query is just used during registration, it shouldn't be too much of a problem. Although it should be noted that requiring two unique attributes goes "against the grain" of typical GAE model design. So, perhaps it's not so important that every user has a unique nickname if they do have a unique user ID? Ultimately, that's up to you.
jamtoday
+2  A: 

The nickname property will be in your index.yaml "naturally" as soon as you run such queries in your SDK, so don't worry about it too much. The indexed property defaults to True (it's normally only used to set it explicitly to False instead).

With the index, searching for a nickname that may occur 0 or 1 times is going to be quite fast anyway, no matter how many entries in the table -- say, order of magnitude, 50-100 milliseconds; putting a new entity, maybe twice as long. The whole thing should fit within 300 milliseconds or less.

One worry is a race condition -- what if two separate sessions are trying to register exactly the same nickname at exactly the same time? May be unlikely, but when it happens you have no defense as your code stands. Getting such a defense (by running in a transaction) implies a transaction lock and therefore may impact performance (if several such sessions are running at exactly the same time, they'll be serialized).

Alex Martelli
Indexing on a property that should be a unique key is a waste of resources in my opinion.BTW, Alex, I just ordered your book, Python in a Nutshell. Can't wait to digest it!
Kris Walker
I disagree with the last comment - unique or not - if you need to check whether a nickname has been used (and can't use key_names for whatever reason), then not having an index will make it slow/impossible to check.To the OP - I'd recommend a key_name on a child entity, similar to Brett's recommendation. I've posted another answer with more details.
Danny Tuppeny
+1 for pointing out the need for a transaction, though now I have nothing worth points to add to the conversation :)
Peter Recore
A: 

It looks like you are treating the nickname as a unique key for the User entity kind.

So I would do this instead(this has already been stated I see)

class User(db.Model):
  # other properties go here, but not nickname

# put a new user
if User.get_by_key_name(user_nick) is None:
  User(key_name=user_nick).put()

The indexing strategy is a waste, even with "just" 500k.

There is also db.Model.get_or_insert()

http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get_or_insert

Kris Walker
but I sort of need the keyname for each entity to be the userid. so basically the userid AND the nicknames must be unique. userid is the key name. the nickname is just something extra that is also unique.
fooyee
I would pick one of the other to be the key_name. Or you could also create two kinds. The kind that has key_name=nickname could reference a "super" user kind that would have a uid as key_name
Kris Walker
A: 

hey I just thought of another method to solve my dilemma!

basically when the user manually enters a nickname, I auto append his/her userid to it to make it unique.

eg:

user_nickname is thomas. I append userid to it, becoming thomas_8937459874 ( unique!)

so I don't need to check if the nickname previously exists. Saves me a GQL query.

when the time comes to display the nickname, I'll just use string manipulation to retrieve only the name "thomas"

what do you guys think?

fooyee
I was thinking of that last night, but then I thought that it would be difficult to only get the UID or only get the nickname. However, I can't think of a use case where that would be a real problem. I say go for it.
Kris Walker
But this makes your username no longer unique.Danny_12345andDanny_123456789Will both appear as "Danny". Wasn't the point of unique nicknames so people can be told apart from each other?Check out Brett Slatskin's I/O video I posted in another answer :-)
Danny Tuppeny
i posted a solution using ReferenceProperty. Please have a look
fooyee
+4  A: 

You should check out Brett Slatkin's Google I/O video:

http://code.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html

Specifically, the bit about Relation Index Entities. He deals with a problem very similar to yours.

You could create another entity, that stores the users nickname (and set it as the key_name). When you create it, set the parent to be the User entity:

UserNickname(
    parent=user,
    key_name=nickname,
    nickname=nickname
)

Now you can query the Nickname (get_by_key_name) very quickly, and if you want to exclude the current user (which you will if you let a user change their nickname), you can easily get the parent from a keys_only query or use the ancestory in the query directly.

Edit: Just noticed Kris Walker already suggested this in a comment. You could use a reference property or parent to link the two together, both should work well.

Danny Tuppeny
A: 
fooyee