views:

165

answers:

3

Hi everyone,

We are currently planning the database structure of a quite complex e-commerce web app that has flexibility as its main cornerstone.

Our app features a large amount of data (products) and we have run into a slight headache trying to keep performance high without compromizing normalization rules in the database, or leaving our highly beloved flexibility concept behind when integrating product options (also widely known as product attributes or parameters).

Based on various references and sources available, we have made up lists on pros and cons of all major and well known database patterns to solve this. After comparing these, we have come up with two final alternatives:

  • EAV (Entity-attribute-value model) :

    Pros: Database is used for all sorting.

    Cons: All related queries will include a number of joins between multiple tables in order to complete the collection of data.

  • SLOB (Serialized LOB, also known as Facade?) :

    Pros: Very flexible. Keeping the number of necessary joins low compared to a EAV design pattern. Easy to update/add/remove data from each product but hard to keep data integrity without additional tables.

    Cons: All sorting will be done by the application instead of the database. Will use lots of performance (memory?) when big datasets is processed by a large number of users.

Our main questions:

  • Which pattern/structure would you use, or maybe even a different solution?
  • Is there better databases besides mySQL available nowadays to accomplish what we want?

Thanks a lot!

Reference: http://stackoverflow.com/questions/695752/product-table-many-kinds-of-product-each-product-has-many-parameters

+1  A: 

MySQL performs very well even for very large datasets. I use it at a financial services SaaS company and it has always worked well. I have also use SQL Server and Oracle for very large applications and MySQL performs no better or worse on whole. My focus is more the business layer, though, and you may get more detailed opinions from people closer to the DB.

When selecting a pattern, keep in mind that it's much more straightforward to scale the application tier than the data tier (easy and cheap to add application servers). Performing many joins for common operations can cause a real performance bottleneck.

I would suggest you prototype both approaches so that you can both get more familiar with each of them, and benchmark their performance in your specific environment.

Additionally, you may want to look into alternatives to SQL that attempt to achieve a pattern similar to the ones you outline. A friend at a very large, well-known Internet company is starting to use Project Voldemort. He prefers it over similar efforts mostly due to the very active community.

Eric J.
+1  A: 

from your solution, it seems you don't want to use a relational model, so perhaps it's better not to use a relational database, take a look at these alternatives: http://nosql-database.org/ btw SQLServer has nice SLOB features in the form of xml fields (can be indexed an queried through XQuery)

Tim Mahy
Hi Tim! Will definitely check out noSQL and SQLServer and see what it's all about. Many thanks
Industrial
+1  A: 

Why limit yourself to one model? It's very possible that you'll be better off with two different models where each one meets a specific goal very well.

Assuming, as is often the case, that the two don't have to be absolutely and instantaneously in sync, you might easily end up with much better overall performance. What kind of hard requirements would you have on synchronization? Milliseconds up to a minute?

Udi Dahan has some good information on command query responsibility separation (CQRS) that's relevant. See also a couple of other articles. InfoQ also has very relevant video of Greg Young from QCon08.

EDIT: Here's another video (by Udi Dahan) that discusses, among other things, the benefits of multiple models.

Kaleb Pederson
Hi Kaleb. Thanks for your thoughts. How do you mean with "in sync"?Will check out the links and see what we can learn!
Industrial
i.e. What if you wrote to one model, and then had a background process that would periodically copy (or synchronize) that data to a reporting model? That process could check ever 250ms or so to see if any new records were available and then make them reportable.
Kaleb Pederson
I added another video link -- this one is probably more pertinent although both videos are quite instructive.
Kaleb Pederson
Hi Kaleb. Thanks a lot for your support here. However I have a hard time trying to take these approaches into a practical example and turning it into a structure model that is applicable on a database. Will try and check out more info regarding CQRS and see if it will do the trick for us. Let me know if you have anything else lying around regarding this subject - preferably on child level ;)
Industrial
I have yet to find a really good introduction or tutorial. Although I haven't yet read it, judging be the ToC, Eric Evans' /Domain Driven Design/ book should be really good and focuses quite a bit on models.
Kaleb Pederson
Thanks Kaleb. Let me know if you find anything. Will check that book out!
Industrial