I've seen two contradictory pieces of advice when it comes to designing row IDs in HBase, (specifically, but I think it applies to Cassandra as well.)
- Group keys that you'll be aggregating together often to take advantage of data locality. (White, Hadoop: The Definitive Guide and I recall seeing it on the HBase site, but can't find it...)
- Spread keys around so that work can be distributed across multiple machines (Twitter, Pig, and HBase at Twitter slide 14)
I'm guessing which one is optimal can depend on your use case, but does anyone have any experience with either strategy?