views:

794

answers:

2

I'm building an application in Google App Engine (Java), where users can make posts and I'm thinking in adding tags to these posts, so I will have something like this:

in entity Post:

public List<Key> tags;

in entity Tag:

public List<Key> posts;

It would be easy to query, for example, all posts with a certain tag, but how could I get all the posts that has a list of tags? I could make a query for each tag and then make an intersection of the results, but maybe there is a better way... because that would be slow with a lot of posts.

Another thing that may be more difficult is having a post, get the posts that have tags in common ordered by the number of common tags, so I could get "similar" posts to this one, in some way.

Well, with joins this would be a lot easier, but I'm starting with app engine and can't really think about a good way to replace joins.

Thanks!

+1  A: 

With this design, I'm afraid your Tag Entity could be a bottleneck, especially if you expect some tags to be very common. Three specific issues I can think of are efficiency of your gets and puts, write contention and exploding indexes. Let's look at stackoverflow for an example - there are 14,000 posts tagged "java" right now.

  1. That means every time you need to fetch your java tag entity you are pulling back 14k's worth of key data from the datastore. then you are sending it all back when you do a put. that could add up to a lot of bytes.
  2. In addition to the bytes going back and forth, each put will require indexes to be updated. each entry in the ListProperty maps to a separate index entry. so now you're doing lots of index updates. which leads us to number 3...
  3. Exploding Indexes. each entity has a limit on how many index entries it can have. I think the limit is 5000 per entity. so that is actually a hard limit on how many posts could ever have the same tag.

Further Reading:

The good news is, some of your requirements would be easily handled by just the Post entity. For example, you could easily find all the posts that have all of a list of tags with a query filter like this:

Query q = pm.newQuery(Post.class)
q.setFilter("tags" == 'Java' && "tags == 'appengine'");

For all posts with either java or appengine tags, you would need to do one query for each tag, then combine the results yourself. The datastore doesn't handle OR/IN type operations right now.

Finding related posts sounds tricky. I'll think about that after some coffee.

Peter Recore
I didn't knew that when I retrieved an entity with a list property all the entities in that list was also retrieved... Is it that way?So I'll remove the list Posts.I also didn't knew that I could query that way over a list property:q.setFilter("tags" == 'Java' That is really good news :)Thanks Peter.
Damian
The full entities in the lists might or might not get fetched depending on exactly how you implement your entities and whether you're using JDO or JPA (read about fetch groups in JDO for example) But even if you just were loading the keys, a few thousand keys will start adding up if you are constantly moving them back and forth.
Peter Recore
A: 

You might want to check out this video from Google IO. Relation Index entities are what you need and allows you to remove List<Key> posts on the Tag entity. As well as List<Key> tags on the Post entity.