views:

1443

answers:

4

I'm working on an application that lets registered users create or upload content, and allows anonymous users to view that content and browse registered users' pages to find that content - this is very similar to how a site like Flickr, for example, allows people to browse its users' pages.

To do this, I need a way to identify the user in the anonymous HTTP GET request. A user should be able to type http://myapplication.com/browse/<userid>/<contentid> and get to the right page - should be unique, but mustn't be something like the user's email address, for privacy reasons.

Through Google App Engine, I can get the email address associated with the user, but like I said, I don't want to use that. I can have users of my application pick a unique user name when they register, but I would like to make that optional if at all possible, so that the registration process is as short as possible.

Another option is to generate some random cookie (a GUID?) during the registration process, and use that, I don't see an obvious way of guaranteeing uniqueness of such a cookie without a trip to the database.

Is there a way, given an App Engine user object, of getting a unique identifier for that object that can be used in this way?

I'm looking for a Python solution - I forgot that GAE also supports Java now. Still, I expect the techniques to be similar, regardless of the language.

+1  A: 

Do you mean session cookies?

Try http://code.google.com/p/gaeutilities/


What DzinX said. The only way to create an opaque key that can be authenticated without a database roundtrip is using encryption or a cryptographic hash.

Give the user a random number and hash it or encrypt it with a private key. You still run the (tiny) risk of collisions, but you can avoid this by touching the database on key creation, changing the random number in case of a collision. Make sure the random number is cryptographic, and add a long server-side random number to prevent chosen plaintext attacks.

You'll end up with a token like the Google Docs key, basically a signature proving the user is authenticated, which can be verified without touching the database.

However, given the pricing of GAE and the speed of bigtable, you're probably better off using a session ID if you really can't use Google's own authentication.

Mark
No, I don't mean session cookies. GAE already provides that to keep track of the logged in user. My question deals specifically with anonymous users and their interaction with content that's associated with a registered user.
Ori Pessach
My suggestion is to use gaeutilities for a non-logged in user.
Mark
non logged in users interact with the application in a completely stateless way, so that's not really applicable. Thanks for the pointer, though - it looks like a handy library.
Ori Pessach
I meant non-google-logged-in, but after registration.
Mark
I like this suggestion a lot, actually. This is basically what I would have done if Google didn't just release the new SDK with unique, permanent user identifiers. The only problem with that is that it's actually very hard (maybe impossible) to avoid collisions - even if you check the datastore, there's no guarantee that a collision won't occur between your check and when you save the data, if two users register at the same time and they both end up with the same hash. This is very unlikely.
Ori Pessach
You can get around that with transactions (http://code.google.com/appengine/docs/python/datastore/transactions.html) or by giving the hash a uniqueness constraint (which it would have as primary key). You'd have the same issue if you were creating random keys and looking them up each time, as well. But yes, if using Google's authentication is a prerequisite, best to take full advantage of it!
Mark
The way Google implements transactions is the reason why I said it would be hard to do. :)I use transactions elsewhere in the application, and there are strict limitations on what you can do in them. Queries are not allowed, for example, which seems to limit their applicability here unless I'm missing something clever. I'm not sure about key uniqueness, either - I'm not that familiar with all of the datastore's quirks yet.
Ori Pessach
A: 

Can you use java.util.UUID?

Kevin
Not in Python, I can't.
Ori Pessach
In python you can use uuid.uud4(). But it is better to use the new user_id() in GAE sdk 1.2.1 for your feature, because the user_id() will be the same even if the user changes their email address.
dar
+2  A: 

I think you should distinguish between two types of users:

1) users that have logged in via Google Accounts or that have already registered on your site with a non-google e-mail address

2) users that opened your site for the first time and are not logged in in any way

For the second case, I can see no other way than to generate some random string (e.g. via uuid.uuid4() or from this user's session cookie key), as an anonymous user does not carry any unique information with himself.

For users that are logged in, however, you already have a unique identifier -- their e-mail address. I agree with your privacy concerns -- you shouldn't use it as an identifier. Instead, how about generating a string that seems random, but is in fact generated from the e-mail address? Hashing functions are perfect for this purpose. Example:

>>> import hashlib

>>> email = '[email protected]'
>>> salt = 'SomeLongStringThatWillBeAppendedToEachEmail'

>>> key = hashlib.sha1('%s$%s' % (email, salt)).hexdigest()
>>> print key
f6cd3459f9a39c97635c652884b3e328f05be0f7

As hashlib.sha1 is not a random function, but for given data returns always the same result, but it is proven to be practically irreversible, you can safely present the hashed key on the website without compromising user's e-mail address. Also, you can safely assume that no two hashes of distinct e-mails will be the same (they can be, but probability of it happening is very, very small). For more information on hashing functions, consult the Wikipedia entry.

DzinX
I considered hashing, and it won't buy me much due to the possibility of collisions (very unlikely, but a robust program should check for it) I still need a roundtrip to the database, at which point I might as well just generate a random ID and check that. Which is exactly what I was trying to avoid.As for unauthenticated users, they can't generate content, so it's a non-issue.
Ori Pessach
+4  A: 

Your timing is impeccable: Just yesterday, a new release of the SDK came out, with support for unique, permanent user IDs. They meet all the criteria you specified.

Nick Johnson
"If the current user is not signed in, the Users constructor raises a UserNotFoundError." - i.e. it requires a Google sign in. However, I'd say that using the Google sign-in mechanism is better than rolling your own, especially for user expectations.
Mark
However, it occurs to me that user_id might be world unique, which would not be good.
Mark
This sounds like it's exactly what I'm looking for, actually. I do use Google sign-in, and a world unique user_id is actually a requirement. Perfect.
Ori Pessach
So you *don't* want to track non-Google users? By world unique, I mean it's the same on other websites. If user_id is indeed world unique (I haven't tested), you should consider that people may be able to match your users to their emails.
Mark
Oh, i see what you're saying about tracking users through their user_id. I'll have to think about the implications of that one.And no - I don't need to track non-Google users. People who post content must be logged in, and therefore must be Google users, but to visit the site and read content, all I need is a unique way of identifying the content in the URL, which the user_id() is fine for.
Ori Pessach
Hi, Nick, Ori, and others. Have you had any luck generating the user_id() from a User object after you create it? (That is, not from the users.get_current_user() call?) user_id() is returning None for me when I do it this way. If you have any tips, I would appreciate some feedback over in my question: http://stackoverflow.com/questions/816372/how-can-i-determine-a-userid-based-on-an-email-address-in-app-engineThanks.
jhs