How do you deal with denormalization / secondary indexes in database sharding? | ansaurus

tags:

views:

49

answers:

1

+1 Q:

How do you deal with denormalization / secondary indexes in database sharding?

Say I have a "message" table with 2 secondary indexes:

"recipient_id"
"sender_id"

I want to shard the "message" table by "recipient_id". That way to retrieve all messages sent to a certain recipient I only need to query one shard.

But at the same time, I want to be able to make a query that ask for all messages sent by a certain sender. Now I don't want to send that query to every single shard of the "message" table. One way to do this is to duplicate the data and have a "message_by_sender" table sharded by "sender_id".

The problem with that approach is that every time a message has been sent, I need to insert the message into both "message" and "message_by_sender" tables.

But what if after inserting into "message" the insertion into "message_by_sender" fail? In that case the message exists in "message" but not in "message_by_sender".

How do I make sure that if a message exists in "message" then it also exists in "message_by_sender" without resorting to 2 phase commit?

This must be a very common issue for anyone who shards their databases. How do you deal woth it?

+1 A:

There is no "silver bullet" to this problem. Some options:

Use a message queue to post the changes. Eventually the changes would make it to the different partitions.
Have a trigger on the message table partitions that create a "index entry needed" row in a table. Something else would periodically scan this and create the index.

You might want to read this blog entry about doing distributed transactions on Google App Engine: http://blog.notdot.net/2009/9/Distributed-Transactions-on-App-Engine. Basically, if you don't want 2phase commit or Paxos or something like that, then you need to live with some sort of eventually consistent model.

-Dave

Dave 2010-05-02 03:51:53

related questions

What language do you use for Postgresql triggers and stored procedures?

Are Multiple DataContext classes ever appropriate?

Which tools do people use to create Data Dictionaries?

Any experiences with Protocol Buffers?

Mechanisms for tracking DB schema changes

How big can a MySQL database get before performance starts to degrade.

How do I index a database field

How does database indexing work?

How do I connect to a database and loop over a recordset in C#?

Editing database records by multiple users

Object Oriented vs Relational Databases

VFP .NET OLEdb provider does not work in Win 64-Bits. Help

Embedded Database for .net that can run off a network

Connect PHP to an AS/400

Swap unique indexed column values in database.

cx_Oracle - what is the best way to iterate over a result set?

cx_Oracle - How do I access Oracle from Python?

.NET Migrations Engine

Is there a version control system for database structure changes?

SQLite and XSD

How do I version my MS SQL database in SVN?

XSD DataSets and ignoring foreign keys

Flat File Databases in PHP

Throw Error In MySQL Trigger

Binary Data in MySQL