views:

75

answers:

2

I've been using mySQL for an app for some time, and the more data I collect, the slower it gets. So I have been looking into NOSQL options. One of the things I have in mySQL is a View created from a bunch of joins. The app shows all the important info in a grid, and the user can select ranges, do searches, etc. On this data set. Standard Query stuff.

Looking at Cassandra everything is already sorted based on the parameters I provide in my storage-conf.xml. So I would have a certain string as my key in the SuperColumn, and keep a bunch of the data in Columns below that. But I can only sort by one Column, and I can't do any real searching within the columns without pulling all the SuperColumns, and looping through the data, right?

I don't want to duplicate data across different ColumnFamilies, so I want to make sure Cassandra is appropriate for me. In Facebook, Digg, Twitter, they have plenty of searching functions, so maybe I am just not seeing the solution.

Is there a way with Cassandra for me to search for or filter specific data values in a SuperColumn, or its associated Column(s)? If not, is there another NOSQL option?

In the example below, it seems I can only query for phatduckk, friend1,John, etc. But what if I wanted to find anyone in the ColumnFamily that lived in city == "Beverley Hills"? Can it be done without returning all records? If so, could I do a search for city == "Beverley Hills" AND state == "CA"? It doesn't seem like I can do either, but I want to make sure and see what my options are.

AddressBook = { // this is a ColumnFamily of type Super
  phatduckk: {    // this is the key to this row inside the Super CF
    friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},
    John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
    Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
    Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
    Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
  }, // end row
  ieure: {     
    joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
    William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
  },

}

+1  A: 

You cannot perform those kind of operations in Cassandra. There is a certain kinds of selection predicates that can be set on column-keys but nothing on the value that they hold. Look at the API and check get_slice/get_superslice and get_range query types. Again, all of this is concerning the keys in the ColumnFamily or SuperColumnFamily not the values.

If you want the kind of functionality that you have described then your best bet is a SQL database. Build proper indexes on your tables, especially on the columns that are most queried and you will see a big difference in the query performance. Hope this helps.

Sagar V
Can you do these kinds of operations with any other NOSQL type setup? How do you think a site like Facebook does the different kinds of searching/queries they have on their site with Cassandra? There are multiple ways to search for data. You think it's duplicated in some places and there is just multiple ColumnFamilies allowing the data to be searched in different ways?
Hallik
@Hallik: It is possible that they are duplicating the data across various families, of course I can't be certain. It is an option; I am using Cassandra in a certain project to track activities by users, activities by groups etc and I have created a bunch of SuperColumnFamilies for feeding/fetching related updates. All I need to do then is perform a look-up.
Sagar V
Maybe I just need to work harder at getting off the relational DB mindset. I am convinced there is an optimal way to move my data off MySQL and get the same functionality in my app in Cassandra. Thanks for your input.
Hallik
You're welcome :) A last small input - in case your app is data and process intensive stick to a SQL database.
Sagar V
Well this app collects tons of data every month, allowing a user an easy way to sift through it, but it's getting slower and slower even though I have indexed all the appropriate columns and so far no one was able to see the problem with those queries. So I am looking at alternatives. What I may try doing is creating A LOT of ColumnFamilies. Like in the example above, I could have a City CF, State CF, etc. Then as a Column in that CF, a JSON array of Users UUID that are associated with the City or State CF. I am determined there is a way to do this :)
Hallik
+3  A: 

You "don't want to duplicate data across different ColumnFamilies," but that is how you do this kind of query in Cassandra. See http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/

jbellis
That is probably the essence of fetching data in Cassandra. If you want query based on a field better make that field a key! Thanks for providing this link. Helps me validate a lot of my design decisions which I initially felt were questionable :)
Sagar V
Yeah the more I thought about it, the more I realized that duplicating the data I want to query off of is like creating my own index, which isn't necessarily a bad thing. I am playing with mongoDB right now too, but I may switch back. Just doing a lot of testing in the next couple weeks. Thanks for that link though, it really did help!
Hallik