views:

92

answers:

2

I've built a product database that is divided in 3 parts. And each part has a "sub" part containing labels. But the more I work with it the more unstable it feels. And each addition I make it takes more and more code to get it to work.

A product is built of parts, and each part is of a type. Each product, part and type has a label. And there's a label for each language.

A product contains parts in 2 list. One list for default parts (one of each type) and one of optional parts.

Now I want to add currency in the mix and have come to the decision to re-model the entire way I handle this.

The result I want to get is a list of all product objects that contains the name, description, price, all parts and all types that match the parts. And for these the correct language labels.

Like so:

product
    - name
    - description (by language)
    - price (by currency)
    - parts
        - part (type name and part name by language)
        - partPrice (by currency)

The problem with my current setup that is a wild mix of db.ReferenceProperty and db.ListProperty(db.key)

And getting all data by is a bit of a hassle that require multiple for-loops, matching dict and datastore calls. Well it's bit of a mess.

The re-model(un-tested) look like this

class Products(db.model)
    name = db.StringProperty()
    imageUrl = db.StringProperty()
    optionalParts = db.ListProperty(db.Key)
    defaultParts = db.ListProperty(db.Key)
    active = db.BooleanProperty(default=True)

    @property
    def itemId(self):
        return self.key().id()

class ProductPartTypes(db.Model):
    name= db.StringProperty()

    @property
    def itemId(self):
        return self.key().id()

class ProductParts(db.Model):    
    name = db.StringProperty()
    type = db.ReferenceProperty(ProductPartTypes)
    imageUrl = db.StringProperty()
    parts = db.ListProperty(db.Key)

    @property
    def itemId(self):
        return self.key().id()


class Labels(db.Model)
    key = db.StringProperty() #want to store a key here
    language = db.StringProperty()
    label = db.StringProperty()

class Price(db.Model)
    key = db.StringProperty() #want to store a key here
    language = db.StringProperty()
    price = db.IntegerProperty()

The major thing here is that I've split the Labels and Price out. So these can contain labels and prices for any products, parts or types.

So what I am curious about, is this a solid solution from a architectural point of view? Will this hold even if there's thousands of entries in each model?

Also, any tips for retrieving data in a good manner are welcome. My current solution of get all data first and for-looping over them and stick them in dicts works but feels like it could fail any minute.

..fredrik

+1  A: 

IMO your design mostly makes sense. I did come up with almost same design after reading your problem statement. With a few differnces

  • I had prices with Product and ProductPart not as a separate table.
  • Other difference was part_types. If there are not many part_type you can simply have them as python list/tuple.

part_types = ('wheel', 'break', 'mirror')

It also depends on kind of queries you are anticipating. If there are many queries of nature price calculation (independent of rest of product and part info) then it might make sense to design it way you have done.

You have mentioned that you will get all the data first. Isn't querying possible? If you get the whole data in your app and then sort/filter in python then it would be slow. Which database are you considering? For me mongodb looks like a good option here.

Finally why are you suspicious about even 1000 records? You can run a few tests on your db beforehand.

Bests

Shekhar
Thanks. The reason for the separate price is that a product contains one price for each language(currency) and the same for types. Each type has a label based on current language. Is there a way to use tuple with some language support?
fredrik
I want to fetch all data once (at least one for each table) so I don't end upp with a new query for say fetch a label for a product when looping over them.
fredrik
IMO correct approach is to store price in decimal and then you should use appropriate framework utility to show in desired currency format.
Shekhar
About 1 query: Imagine huge amount of data transferred as your database grows. Query only that what is necessary. Reducing number of queries wont save you much (here).
Shekhar
+2  A: 

You need to keep in mind that App Engine's datastore requires you to rethink your usual way of designing databases. It goes against intuition at first but you must denormalize your data as much as possible if you want your application to be scalable. The datastore has been designed this way.

The approach I usually take is to consider first what kind of queries will need to be done in different use cases, eg. what data do I need to retrieve at the same time ? In what order ? What properties should be indexed ?

If I understand correctly, your main goal is to fetch a list of products with complete details. BTW, if you have other query scenarios - ie. filtering on price, type, etc - you should take them into account too.

In order to fetch all the data you need from only one query, I suggest you create one model which could look like this :

class ProductPart(db.Model):
    product_name = db.StringProperty()
    product_image_url = db.StringProperty()
    product_active = db.BooleanProperty(default=True)
    product_description = db.StringListProperty(indexed=False) # Contains product description in all languages
    part_name = db.StringProperty()
    part_image_url = db.StringProperty()
    part_type = db.StringListProperty(indexed=False) # Contains part type in all languages
    part_label = db.StringListProperty(indexed=False) # Contains part label in all languages
    part_price = db.ListProperty(float, indexed=False) # Contains part price in all currencies
    part_default = db.BooleanProperty()
    part_optional = db.BooleanProperty()

About this solution :

  • ListProperties are set to indexed=False in order to avoid exploding indexes if you don't need to filter on them.
  • In order to get the right description, label or type, you will have to set list values always in the same order. For example : part_label[0] is English, part_label[1] is Spanish, etc. Same idea for prices and currencies.
  • After fetching entities from this model you will have to do some in-memory manipulations in order to get the data nicely structured the way you want, maybe in a new dictionary.

Obviously, there will be a lot of redundancy in the datastore with such a design - but that's okay, since it allows you to query the datastore in a scalable fashion.

Besides, this is not meant as a replacement for the architecture that you had in mind, but rather an additional Model designed specifically for the user-facing kind of queries that you need to do, ie. retrieving lists of complete product/parts information.

These ProductPart entities could be populated by background tasks, replicating data located in your other normalized entities which would be the authoritative data source. Since you have plenty of data storage on App Engine, this should not be a problem.

Franck
I really like the background idea! I'm thinking of a combined solution. My solution that hold the logic and your generated by a background task apply it like you described. Then the result try to get the denormalized data if it exists or else make a query against my models. What's exploding indexes?
fredrik
ps. Thanks for the correction of my spelling and grammer :)
fredrik
Here is some info on exploding indexes : http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes
Franck