views:

130

answers:

2

I need a simple tagging system in GAE-J.

As I see it, the entity that is being tagged should have a collection of keys referring to the tags with which it's associated.

A tag entity should simply contain the tag string itself, and a collection of keys pointing to the entities associated with the tag.

When an entity's list of tags is altered, the system will create a new tag if the tag is unknown, and then append the entity's key to that tag's key collection. If the tag already exists, then the entity's key is simply appended to the tag's key collection.

This seems relatively straight-forward and uncontroversial to me, but I would like some feedback on this design, just to be sure.

A: 

I am in the process of wrapping my brain around the datastore (I have been using JPA rather than the low level interface). My immediate reaction is the the entity cannot own the tag and the tag cannot own the entity. This is a classic relational many to many. In my experience owned relationships should be used whenever possible. In this case you cannot use them, so you have to be very careful when managing the many to many.

In particular there is no way to use transactions to guarantee consistency. You should try and prepare your code for an inconsistent datastore. In other words, the keys in both entities might not refer to each other and your code should not crumble/crash/explode if that is the case.

Mark M
+3  A: 

Why store a Tags table at all? This seems very relational-database-minded, and won't be scalable or particularly useful on top of the datastore.

Instead, just store a list of Strings for each taggable entity.

@Persistent
private List<String> tags;

Getting the entity's tags will be a simple lookup (instead of a call to the datastore), finding other items with that tag will be a single call to the datastore:

Query query = pm.newQuery("select from Entities " +
                          "where tagNameParam in tags" +
                          "parameters String tagNameParam");

It will also make writes faster, since you don't have to check if a tag already exists, potentially create a new row in Tags, etc.

What won't be as simple, however, is finding all the unique tags for all entities.

Jason Hall
This is generally the best approach. To keep track of the set of unique tags (and frequencies, etc), you can additionally have a Tags model that you update synchronously, but separately.
Nick Johnson
I guess the primary reason that I thought of this approach is that I could have a large number of entities, 6 or 7 digits, and they won't be in the same entity group. Thus, I thought that the above query could be extremely expensive.With tag entities, no matter how many search results I ultimately have, I would only retrieve the couple of tag entities. Then I could retrieve the actual tagged entites by their keys, instead of having to query for them. I assumed that retrieving by key would be much faster, but I must admit my understanding of the datastore leaves something to be desired...
tempy