views:

1329

answers:

4

I have heard on cassandra database engine few days ago and searching for a good documentation on it. after studying on cassandra I got cassandra is more scalable than other data engine. I also read on Amazon SimpleDB but as SimpleDB has a limitation 10GB/table and Google Datastore is slower than Amazon SimpleDB, I prefer not to use them (Google Datastore, Amazon SimpleDB). So for making our site scaled specially high write rates with massive data, I like to use Cassandra as our Data Engine.

But before starting using cassandra I am confused on "How to handle complex data using casssandra". I am giving you the MySQL database structure below, Please read this and give me a good suggestion.

Users Table
hasColum ID Primary
hasColum email Unique
hasColum FirstName
hasColum LastName

Category Table
hasColum ID Primary
hasColum Parent
hasColum Category

Posts Table
hasColum ID Primary
hasColum UID Index foreign key linked to users->ID
hasColum CID Index foreign key linked to Category->ID
hasColum Title
hasColum Post Index
hasColum PunDate

Comments
hasColum ID primary
hasColum UID Index foreign key linked to users->ID
hasColum PID Index foreign key linked to Posts->ID
hasColum Comment

User Group
hasColum ID primary
hasColum Name

UserToGroup Table (for many to many relation only)
hasColum UID foreign key linked to Users->ID
hasColum GID foreign key linked to Group->ID

Finally for your information, I like to use SimpleCassie PHP Class http://code.google.com/p/simpletools-php/ So, it will be very helpful if you can give me example using SimpleCassie

+2  A: 

Denormalize. See twissandra.com and the documentation at http://github.com/ericflo/twissandra

More examples at http://wiki.apache.org/cassandra/ArticlesAndPresentations

jbellis
+1  A: 

Are you really competing with Google and Amazon in terms of traffic volumes? I'd recommend starting by looking at upgrading your current MySQL infrastructure - how many database servers do you currently run in your cluster(s)? Do you partition data?

C.

symcbean
I am not talking on traffic volume.. I prefer cassandra for its performance... See the Architecture of cassandra http://wiki.apache.org/cassandra/ArchitectureOverviewMySQL Require 300ms to write with 50GB data where cassandra require only 0.12ms .. it is fastest data engineMySQL require 350ms to read with 50GB data where cassandra require only 15ms read The most popular websites are migrating to cassandra for scaling and improving performance including facebook, twitter, digg etc....
Sadiqur Rahman
These headline figures look impressive - but there's no details on how they configured the tests. Also, even using the latest fibre channel switched fabric (i.e. the fastest disk technology available) you'd be lucky to get a sustained 20Gb/s - and that assumes that the underlying disks can cope with this rate/volume of data - or 20,000 times slower than the figures cited for Cassandra on this page. Indeed, 20Gb/s is about the memory bandwidth of a mid/high-range non-NUMA system. The only way these figures could possibly make any sense is if you are looking at a very large database cluster.
symcbean
+1  A: 

From the cassandra's wiki data model reference:

Unlike with relational systems, where you model entities and relationships and then just add indexes to support whatever queries become necessary, with Cassandra you need to think about what queries you want to support efficiently ahead of time, and model appropriately. Since there are no automatically-provided indexes, you will be much closer to one ColumnFamily per query than you would have been with tables:queries relationally. Don't be afraid to denormalize accordingly;

A goog article here.

I hope it helps you.

Aito
A: 

Here's a good article on Twissandra (Twitter clone on Cassandra) that discusses schema design based on data access requirements. You might find it useful http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example/

Sagar V