views:

278

answers:

4

I have a web application running over a MySQL database (in development). I'm considering to migrate my application to Google App Engine, and would like to better understand how my simple relational database model can be transformed to the non-relational approach.

I'm a long time relational database person, and I have no experience with column based DBs such as BigTable. Just in case Google also supports small deployments of relational databases, I would like to state that my question is general and not specific to Google - I would like to understand how simple relational models can be represented in non-relational DBs.

My database (simplified) is as follows:

Items Table
------------

ItemID  ItemName  ItemPriority
1       "Car"     7
2       "Table"   2
3       "Desk"    7

ItemProperties Table
---------------------

ItemID  Property        Importance 
1       "Blue"          1
1       "Four Wheels"   2
1       "Sedan"         0
2       "Rectangular"   1
2       "One Leg"       1

I have many items, each with a name and ID. Each item has multiple properties, each property has several parameters (I only stated the name and "importance" of each property, but there are more). I have tens of millions of items, each has hundreds of properties.

The usage scenario: I receive an ItemName as input, look up its ID in the items table, and fetch all the properties by that id. I then perform some analysis on the list of properties (in memory), and return a result.

90% of the work is lookup based on a parameter, which (if I understand correctly) is the pain-point of non-relational DBs.

What is the recommended approach?

A: 

GQL does not support joins. You can work around this in two ways:

  • Do the join yourself

Just fetch the Item, check its ItemID, and query for ItemProperties with that ItemID. Your tables would look exactly like you specified them. Sure, this is two queries, but the two queries are simple.

  • Use Expando Models

In an Expando model, you can create new fields at runtime. They will not be indexed, so if you want to search on them it may be slower, but simply fetching them is just fine. You can use complex types like ListProperty, too. With this sort of flexibility, you may be able to think of a way to put everything in the ItemProperties table into the Items table, and save yourself a query. Be creative.

Nick Retallack
Regarding your first option, won't the first and mainly the second queries be terribly slow?
Roee Adler
A: 

I have a very similar database structure (our "records" and "recordEntries" tables mirror your "items" and "itemProperties") and am considering a similar migration to a non-relational database. We'll probably go to CouchDB or memcachedb or something like that, rather than Google.

Like you I have no experience working with non-relational databases (nor do my developers). However, we have tossed a couple of ideas around. Our current thoughts are (using your schema):

  • First: Collapse each item plus its item properties into one object with fields (essentially an XML document) and stuff it into the database keyed by identifier. Every time you retrieve an item you get back all the itemProperties too.

Note the difference we have is that we index our content outside the database (with Solr), and therefore don't need to do lookups on the database itself using the "name" property, so YMMV.

  • Second: We're making a list off all the "relational" operations we're doing that can't be supported by the model above. This includes a couple of "grouping" operations where we query items based on a special field in the item table, and a query where we try to detect all the items that have been recently modified (previously accomplished by a query on a date column in the item table). We're inventing alternative implementations for each of these cases (there are only a few, luckily).

If this proves too hard, we'll try the same exercise with another model. Luckily, we have time to plan.

One key point for us is that we're doing all our indexing externally with Solr, so (for example) we don't need to do database lookups on values in the itemProperties values, or to do lookups by name on the item table.

Anyway, that's probably not much help, but I'll be keen to see what sorts of solutions more experienced people can come up with.

PS: I infer your properties table must have billions of rows. How many exactly, and what hardware are you running the MySQL server on? Are you having scalability problems yet with MySQL?

Anon Guy
Thanks for the detailed answer. I don't yet have billions of properties because the system is not live yet. MySQL scale is one of the things I'm trying to avoid in advance by moving to Google or alternatives.
Roee Adler
+1  A: 

From someone who has been working with Non-relational db's for a while your two tables should be really easy to translate to a non-relational db.

Take the two tables and turn them into a single object.

Item: - Id - Name - Properties - prop1 - prop2

Store the whole thing in your data-store columns(Big-Table),document(CouchDB),or whatever else it uses.

You can look up items by any of the ids, names, or properties. There are no joins which are one of the bigger pain points of non-relational dbs. Parameter lookups aren't really a pain point unless I'm not understanding what you mean by that. You may have to do multiple lookups but most times that is not an issue and it scales way better than an rdbms does.

In your example I actually consider the non-relational model to be simpler and easier to implement and understand.

Each non-relational data store has different conventions and constraints though so it's hard give guidance in the general sense. CouchDB can create an index on any part of the object with it's views for instance. With BigTable you may have to store multiple copies of the denormalized data to get fast indexed lookups. Others will have different things to consider when you decide how to store the data. There is quite a lot of differentiation out there once you leave the world of SQL.

Jeremy Wall
A: 

You need to flatten it all up, I think AppEngine allows structures like

ID=1, ItemName=Car, ItemPriority=7, Property=(Blue,1),Property=(Four Wheels,2),Property=(Sedan,0) ID=2, ItemName=Table, ItemPriority=2, Property=(Rectangular,1),Property=(One Leg,1) ID=3, ItemName=Desk, ItemPriority=7

Notice that the same "field" could have multiple values, and that you could use multiple items in it.

Your sample data would be 3 rows in one table.

Osama ALASSIRY