views:

44

answers:

1

I'm writing software to provide feedback to people across many categories. For example, I might have 30 employees and 40 standards of evaluation (e.g. "arrives on time," "is polite," "appears to brush his teeth," etc). At arbitrary times, the supervisor can submit a piece of feedback like "employee 3 gets a 5/5 for standard 8 (he smells great)" or "employee 10 gets a 1/5 for standard 12 (he just called a customer an idiot)."

My idea is to store these small pieces of feedback individually, linked to the employee and standard by keeping userId and standardId fields.

The problem comes when I want to look at the feedback for all 30 employees and 40 standards. My current approach requires 1200 queries to retrieve all of that data. I'm looking for a better way. I'm using the google appengine datastore, which is a non-relational db.

Things I've been thinking about, and on which I welcome feedback:

  1. I could store the feedback in a grid, with a row per user and column per standard. Then, a single query gets all of the data (better than 1200), but entering new data becomes more difficult (fetch the grid, update the correct bit, store the grid) and changes in the user set or standard set become much more complex (if I add a standard in the middle, this grid needs to be updated). Also, some queries become much harder - I can no longer easily search for the assessments entered on a certain date or by a certain supervisor.

  2. I could store all of the feedback for a certain (set of users x set of standards) in an unorganized list, fetch it with a single query, and then sort it out in my own code. This requires me to loop through 1200 entries, but that would be faster than 1200 queries over all of the data in the whole system (there may be many, many irrelevant data for other sets of users and unrelated standards).

So, the short version of my question is: how should I store this data for the best balance of quick retrieval of a large subset and quick insertion of individual pieces of feedback?

+1  A: 

You might be able to do this using a RelationIndex. Depending on how exactly you will want to allow user to view and query the data, it should work.

The idea is pretty straight forward, basically you will store a list of "standards" for each employee. And possibly a list of employee's for each standard. Then you'll be able to ask questions such as all employee's who 'smell good'.

Because you have scores for each standard, you might want to do something like store the "score" and "standard number" as a pair in the list ("3:12") so that you can find everyone who has a score of 3 on standard 12.

edit: Updated based on comment.

It sounds like you need to deal with a few different issues. First, you need to deal with editing and maintaining the data. Second, you need to deal with querying the data. Third, you are going to need to handle displaying the data.

For querying the data efficiently you will probably need some approach similar to what I initially suggested. What is more common, editing or viewing the data? That will impact how you setup your models.

If you are only dealing with 30 or 40 employees and 30 or 40 standards, maybe you could use something like the following:

class Evaluations(db.Model):
    period = db.StringProperty()
    standards = db.TextProperty()
    scores = db.TextProperty()

class EvaluationsIndex(db.Model):
    index = db.StringListProperty()

Use the standards property on Evaluations to store a list of standards evaluated. Then store your employee-standard-score grid in the scores property. Obviously you'll need to serialize both the standards list and the evaluation grid, perhaps using something like JSON. Use the EvaluationsIndex model as I mentioned above.

With this (or something really similar) you will have pretty easy edits, very easy display, and support for queries.

You could add an additional model to track which supervisor entered the evaluation and her notes.

Robert Kluin
Thanks for the link to that talk; it was interesting. I don't think a relation index would help here, though, because each score only has a single "receiver" (in users x standards). Since there are just two ids there, the deserialization cost would be negligible. The real problem I'm having is that I need to find the scores connected to any of 1200 receivers - it's like 1200 people logging on to his microblogging service at once... _per user!_
Riley
Perhaps I miss-understood your issue. I thought you were only looking for a method to query and identify employees with a particular value for some standard. You are going to need a combination of approaches to make this work on App Engine.
Robert Kluin
Thanks for your further thought. This scheme of storing all of the scores in a big text column is what I was trying to get at in option #1 of my original post, where I used "grid" instead of "TextProperty." For the serialization, I was thinking of just using a List<Score> kind of mechanism, and have appengine do the serialization for me. I assume that deserializing 1200 objects will be much faster than performing 1200 queries, and will certainly _cost_ less. I'll just go ahead and try this method and time it. Thanks again!
Riley
Just do some testing first, you may encounter an issue storing 1,200 items in one db.ListProperty. For one of my apps, after testing, I went with a design more similar to what I detailed above. I *never* deserialize the large RelationIndex list, I use key names and rewrite the entire thing when it changes. And, I render the grid client-side -- so storing my data as a JSON string save a bunch of work server-side.
Robert Kluin
Great point about sending to the client. I'll try JSON first.
Riley