views:

21

answers:

1

Here is the scenario. The user (web environment) can import a document in a foreign language. When displaying the document the application highlights the words the user does not know yet. The user can then mark some of those words as known, adding them to his dictionary.

So basically we have a List of Strings representing the words in the document, and a Set of Strings representing the user known words (his dictionary). Now, for each String in a List (document) we need to find out if the word exists in the Set. Sounds simple, but how should this be modeled in the database?

The Set of known words needs to be persistent allowing the user to add words with each session. The set of known words can potentially grow to thousand of Strings. The imported document can have hundreds of words.

I see two solutions, both of them don't scale very well. I created a Dictionary class with a Set of known words for each user.

@PersistenceCapable(identityType=IdentityType.APPLICATION)
public class UserDictionary {
    @PrimaryKey
    @Persistent(valueStrategy=IdGeneratorStrategy.IDENTITY)
    Long id;
    @Persistent String userId;      

    @Persistent Set<String> knownWords;
}
  1. I can query the database for each word in the imported document to check if it exists in the knownWords Set. Since a document may have hundreds of words this solution would force me to query the database hundreds of times.

  2. Retrieve the whole UserDictionary class with potentially thousand of words and run a quick check for each word in the document if it exists in the knownWords Set. The check would be quick, but I would have to keep the whole Dictionary in the user Session eating server memory, and the cost of retrieval would be huge as the app engine needs to deserialize the whole Set.

I don't like any of the above...any other ideas?

+1  A: 

Why would you model this in the database? Instead, load the entire document and the entire dictionary into memory, and do it there.

Nick Johnson