ansaurus

Question

Transactions over very very large entity group

Answer 1

+2 A:

The arrangement you describe should work fine. If your entity group grows excessively big (we're talking hundreds of megabytes of transactions before this becomes an issue), you could write a procedure to 'roll up' old transactions: transactionally replace a set of old transaction records with a single one for the sum of those transactions, in order to maintain the invariant that the balance is equal to the sum of all transactions. If you still need to store a record of these old, 'rolled up' transactions, you can make a copy of them in a separate entity group before you perform the roll-up.

Nick Johnson 2010-07-26 12:45:51

This is an excellent idea, you could for example, have a "cron" task that checks for entities with more than X records and then simply move, the oldest X transactions to another entity group.

corydoras 2010-07-27 08:35:08

Just curious, where do you get the figure "hundreds of megabytes" as a guideline? Wouldn't it be a bit too much to have a "Transaction" object loaded into memory that contains 100Mb of data?

corydoras 2010-07-27 08:45:13

By hundreds of megabytes, I mean the size of the entity group as a whole - the account and all its transactions - not the size of a single entity. I get the hundreds of megabytes guideline from my own experience as a Bigtable administrator. :)

Nick Johnson 2010-07-27 08:54:10

Thanks! I have been tossing around in my head all weekend ways to try and split the data up, its difficult when you are new to JDO and bigtables. It leads me to another more practical question on how you would actually deal with a `Set<Transaction>` or `List<Transaction>`, I am not sure how to manage this if there are literally 100Mb of objects, ie 1 million objects

Jacob 2010-07-27 09:50:14

In that case, you would be better off having the child entity reference the parent, but not have it in a list. I'm not sure how that's done in JDO, though.

Nick Johnson 2010-07-27 10:13:10

Answer 2

+2 A:

You are correct that Transaction and TransactionAccount must be in the same entity group in order to do the transactional insert and update operation.

The reason to shard is to reduce write contention but you say this will be a low write entity, so sharding is not needed here.

To keep the size of your entity groups down, you can device some type of archiving process. For example, if this is for a bank account, then when the monthly statement is generated you could archive that month's worth of transactions.

cope360 2010-07-26 12:54:16

This is close, however a month boundary may not be the best way to do it.. as in peak periods a lot can happen in a month.

corydoras 2010-07-27 08:31:38

ansaurus

tags:

views:

answers:

Transactions over very very large entity group

related questions