views:

299

answers:

0

Hi,

I am trying to design a high scale key value storage system. The hbase schema for the same is outlined below:

{ "userid1" : { "update" : { t3 : "some update1", t2 : "some update2", t1 : "some update3" }, "sender" : { t3 : "sender3" t2 : "sender2" t1 : "sender1" },

"userid2" : { "update" : { t9 : "some update9", t6 : "some update534", t1 : "some update343" }, "sender" : { t9 : "sender3" t6 : "sender2" t1 : "sender1" },

}

The system is going to have around 15-20M users with around 3-4M put write operations per day (which rules out mysql automatically). The max number of entries in "update" and "sender" columns will be around 1000 (around 1 weeks updates)

My queries would be like "For a given userid, return top 20 updates, senders based on timestamp". Is there a way to make a secondary index on "userid, timestamp" which can help speed up my "get" calls? Or any changes I need to make in the design to minimize response time for get calls ?

thanks, pigol