views:

148

answers:

3

Hey Guys,

I have been tasked with designing an ecommerce solution. The aspect that is causing me the most problems is the database.

Currently the site consists of 10+ country based shops each with their own database (all residing on the same mysql instance).

For the new site I'd rather all these shop databases be merged into one database so that all tables (products, orders, customers etc.) have a shop_id field. From a programming perspective this seems to make the most sense as we won't have to manage data across multiple databases.

Currently the entire site generates about 120k orders a year, but is experiencing fairly heavy growth and we need to design a solution that will scale. In 5 years there may be more than a million orders per year and a database that contains 5 years order history (archiving maybe a solution here). The question is - do we use a single database, or do we keep the database-per-shop structure?

I am currently trying to find supporting evidence for either avenue. The company I am designing the solution for prefer the per-shop database structure because they believe it will allow the sites to scale. But my argument is that the shop's database probably won't get that busy over the next few years that they exceed the capacity of a mysql database and a "no expenses spared" hardware set-up.

I am wondering if anyone has any advice either way? Does anyone have experience with websites / ecommerce sites that have tables containing millions of records? I know there is probably not a clear answer here, but at what stage do we have too many records or too large table files to have a fast loading site?

Also, if anyone has any advice on sources of information - books, websites, etc. where I can do further research, it would be highly appreciated!

Cheers, imanc

+1  A: 

I've implemented a theater ticket sales solution which has tables with a couple of hundred thousand records and there are no performance issues to speak of (the hardware's nothing special). While it's hard for me to compare the loads, I would say it's unlikely that a 10x increase in data volume would noticeably impact performance. If it's the same application and the same schema, I'd very likely lean towards a single central database (probably with fail-over) because:

  • maintenance is much more efficient and less error-prone
  • you can do cross-shop reporting
  • you can always offload the historic data to historic tables/databases if performance really suffers

and probably a number of other reasons. The obvious advantage of having multiple instances is that you get poor-man's high availability: if one server is down, only one shop doesn't work and you get this behaviour out of the box.

Tomislav Nakic-Alfirevic
Yeh, all valid points thank you! The issue of high availability is really a non-issue, as all schemas sit in the same mysql instance on the same server. So if the server is going down, then so to is every shop.The only possible advantage is if there were some corruption on one of the shop tables. Obviously this would affect one shop, not every shop. But even if we kept shop dbs separate there would still be a central content db and if that went, so too would everything else. So I am not really coming up with any solid reasons for separate databases.
Well, a couple of additional things come to mind: the data model can be a bit simpler (as the fact that all information in a specific database belongs to a certain shop is implicit). Consequentially, business logic can also be a bit simpler (mostly with data retrieval). I would still go with a single database for all the shops.
Tomislav Nakic-Alfirevic
+1  A: 

I would argue its easier to keep separate databases. It just makes more sense to have logical separation of these entities which have no direct relation. It will also be FAR easier to scale up, so that each site can run on separate hardware if/when the time comes. Backup/restore and general maintenance procedures will also be far easier on separate instances, because it allows for customised procedures per shop. Any disaster scenarios also only affect one logical database, rather than potentially screwing up every single shop.

Your current proposal will mean that just about every table will need a 'shop id' column, which is also indexed to prevent collisions. Separating this data later when you need to scale up wont be too much of an issue, but reprogramming most likely will be very time consuming.

PaulG
True, we could scale easily by having a shop-db per physical server, if the site load became massive. Each shop exists essentially on the same domain, e.g. www.oursite.com/uk, www.oursite.com/es etc. and they contain the same stock essentially. Do you think there would become a point where we'd need to separate out the shops? Surely other techologies / approaches exist that could enable the site to scale to say 5 million orders per year across all shops?Also these shops are tied by their content database and codebase which would always be shared, I imagine.What do you think?
Oh I see. Sorry, it wasnt clear from your original question that there was such a tight logical relationship. In that case, I'd say a single database was the right plan.
PaulG