This is a relatively complex problem that I am thinking of, so please suggest edits or comment on parts where you are not clear about. I will update and iterate based on your comments
I am thinking of a developing a rails gem that simplifies the usage of sharded tables, even when most of your data is stored in relational databases. I believe this is similar to the concept being used in Quora or Friendfeed when they hit a wall scaling w traditional mysql, with most of the potential solutions requiring massive migration (nosql), or just being really painful (sticking w relational completely)
- http://bret.appspot.com/entry/how-friendfeed-uses-mysql
- http://www.quora.com/When-Adam-DAngelo-says-partition-your-data-at-the-application-level-what-exactly-does-he-mean?q=application+layer+quora+adam+
Essentially, how can we continue using MySQL for a lot of things it is really good at, yet allowing parts of the system to scale? This will allow someone got started using mysql/activerecord, but hit a roadblock scaling to easily scale the parts of the database that makes sense.
For us, we are using Ruby on Rails on a sharded database, and storing JSON blobs in them. Since we cannot do joins, we are creating tables for relationships between entities.
For example, we have 10 different type of entities. Each entity can be linked to each other using a big (sharded) relationship tables.
The tables are extremely simple. The indexes is (Id1, Id2..., type), and data is stored in the JSON blob.
- Id, type, {json data}
- Id1, Id2, type {json data}
- Id1, Id2, Id3, type {json data}
We have put a lot of work into creating higher level interfaces for storing a range of data sets for relational data
For any given type, you can define a type of storage - (value, unweighted list, weighted lists, weighted lists with guids)
We have higher level interfaces for each of them - querying, sorting, timestamp comparison, intersections etc.
That way, if someone realizes that they need to scale a specific part of the database, they can keep most of their infrastructure, and move only the tables they need into this sharded database
What are your thoughts? As mentioned above, I would love to know what you folks think