views:

25

answers:

1

Hi,

I've been getting more into app engine, and am a little concerned about space used per object in the datastore (I'm using java). It looks like for each record, the names of the object field are encoded as part of the object. Therefore, if I have lots of tiny records, the additional space used for encoding field names could grow to be quite a significant portion of my datastore usage - is this true?

If true, I can't think of any good strategies around this - for example, in a game application, I want to store a record each time a user makes a move. This will result in a lot of little objects being stored. If I were to rewrite and use a single object which stored all the moves a user makes, then serialize/deserialize time would increase as the moves list grows:

So:

// Lots of these?
class Move { 
    String username;
    String moveaction;
    long timestamp;
}

// Or:
class Moves {
    String username;
    String[] moveactions;
    long[] timestamps;
}

am I interpreting the tradeoffs correctly?

Thank you

+2  A: 

Your assessment is entirely correct. You can reduce overhead somewhat by choosing shorter field names; if your mapping framework supports it, you can then alias the shorter names so that you can use more user-friendly ones in your app.

Your idea of aggregating moves into a single entity is probably a good one; it depends on your access pattern. If you regularly need to access information on only a single move, you're correct that the time spent will grow with the number of moves, but if you regularly access lists of sequential moves, this is a non-issue. One possible compromise is separating the moves into groups - one entity per hundred moves, for example.

Nick Johnson
One other thing worth mentioning is indexes. With the single move per entity, there will be one index entry per move. In the aggregate entity, you could have a choice of whether each move is in the index (by using Lists for your moves and timestamps) or only the aggregate is in the index. (perhaps by having a last_move and first_move property in addition to the full list. that way you could still query to find aggregates that contain moves that fell within a time period, without needing each move in the index indvidiually).
Peter Recore
At the end of the day, if you have many different indexes on a entity, the space used by the indexes could start to compete with the space used by the entities themselves.
Peter Recore
@Peter Good points. I didn't mention indexing since it wasn't explicitly asked about, but yours is an excellent summary.
Nick Johnson