views:

37

answers:

1

I have a list of tags defined in a StringListProperty().

The DB contains around 1 million entries and each entry has around 20 different values in the list.

e.g.

a = [ 'ab', 'bc', 'ca', 'x', ....]

b = ['x', 'm', 'a', .... ]

I am using Google App Engine so I have constraints on running batch jobs ... (only 30 sec allowed)

Here is my question:

Given a list a, I want to find all lists which have most number of elements common with a ... in descending order of number of common elements...

how can i do this with app engine?

***update

I am storing tags for URLs - [shopping, shop, social-shopping, ....]

Basically, I want to find URLs which are of similar content by

(1) matching the tags (2) looking at the frequency of tags per URL to decide which URLs are "more" related content

A: 

I don't think there's any neat way to do this in App Engine - or for that matter, in any DBMS with only standard one-dimensional indexes available.

Perhaps if you expand on what you're trying to achieve, someone can suggest an alternative?

Nick Johnson
Updated the above question.
demos