views:

616

answers:

5

Hi all,

I am working on a Web based organisation tool. I am not aiming the same market as the wonderful Basecamp, but let's say the way users and data interact look like the same.

I will have to deal with user customisation, file uploads and graphical tweaks. There is a fora for each account as well. And I'd like to provide a way to backup easily each account.

I have been thinking how to create a reasonable architecture and have been trained to use beautifully normalized data in a single (yet distributed if needed) MySQL DB. Recently I have been wondering : is it possible to think about using one SQLITE DB to store the data for each account and only use MYSQL for the general web site management ?

The pro :

  • backing up is straightforward : set version, zip, upload.
  • don't bother if each account use a massive fora : the mess is in one file for every one of them.
  • SQLITE is lightening fast, no expensive connection time...
  • Table scheme is much simplier : no need to make any distinction between account every times

The cons :

  • don't know if it's scalable
  • don't know if the hard drive will keep up
  • don't know if there is a way for SQLITE to not be stored in RAM since it would be quickly a disaster
  • lots of dir and subdirs : will this be ok ?
  • maintenance issue : upgrading the live site means upgrading all the db one by one
  • dev issue : setting a dev / pre prod / prod env will be quite hard
  • commom data will still require using mysql, so we would end end with 2 DB connections for each page, arg

More cons that pros, still, it makes me wonder (zepplin style).

What do you say ?

+4  A: 

SQLite isn't exactly the most scalable solution. There's a reason why you were trained to use a beautiful, normalized database which can be distributed if need be. It's scalable, and it's easier to maintain.

I would stick with a single MySQL database, because you're already familiar with that architecture, and it's more than likely the best solution for the job.

Alex Fort
:) don't put beautiful and database on the same sentence
Ric Tokyo
Some of us think that relationships in a nicely normalized database ARE beautiful :P
Alex Fort
I agree, happiness is based on beautiful relationships, isn't it ? :-)
e-satis
+4  A: 

I gave it more thoughts, and I see more problems :

  • If I want to use a common area for all the forum, I am smoked
  • If I want to search trough all the web sites, it will be apocalyptic
  • If I want stats, God save my soul

This was a bad idea.

Anyway, writing it on SO forced me to think seriously about it. Better to have a clear mind before starting.

If one day, somebody is as foolish as me, hope he'll hit the page, so it can help him to reconsider

Love this site (even if it's ASP :-p) : quickest way to help yourself, while still helping others.

e-satis
It's not asp, it's ASP.NET using MVC. And don't be a code-religionist. I'm tempted to downvote just for that. But I won't.
jcollum
Collecting stats becomes indeed difficult because you have to make 100s connections instead of one sql statement. But I'm still curious to its performance.
tuinstoel
+5  A: 

Essentially what your question is is: Which of the above is better for a multi-tenancy SaaS application?

A thousand little sqlite databases may appeal, but may not scale.

I'll address your points in turn:

  • backing up is straightforward : set version, zip, upload.

What happens if an update occurs during your backup? How long does your backup take (with say, a thousand accounts with a million posts each) ? Does your backup lock the database out for the duration of the backup, or does it backup a consistent view of the data, or neither? Does a database backup taken during updates restore correctly in each case? Have you tested these things?

I think backing up a sqlite database isn't as easy as you seem to think, because of the concurrent access issue.

  • SQLITE is lightening fast, no expensive connection time...

Your implication that the MySQL connection time would be "expensive" may be false. Do you have hard data? Connecting to a server over a LAN does not take very long in practice.

  • Table scheme is much simplier : no need to make any distinction between account every times

Have you thought about how you'll do a migration if you ever need to change the schema on these 1,000 small databases? What impact will it have on the service?


I'll now add some of my own:

  • Scalability: If you are relying on a local filesystem with locking semantics (As I believe sqlite does), you cannot simply add more web servers, as they would have their own filesystems.

Because sqlite is not a network-based system, you can't just add more web servers. You would need to either partition your users across several servers and ensure that they only ever hit their own "home" server (which is going to introduce some issues but may be viable), or figure out some way to share a sqlite database between servers, which is not going to be pretty, and may well erase any perceived performance advantages that it ever had over (e.g.) MySQL.

  • Maintainabiliy - If your development team ever make a schema change to the database (which is not just possible, but very likely), it will need to be applied to these 1,000 tiny databases. Successfully. With a rollback plan.

I think that scaling a system with a thousand tiny sqlite databases won't scale. In particular, you will probably end up finding that instead of 1,000 tiny ones, you end up with 995 tiny ones, and 5 rather large ones.

Using a dedicated MySQL server will enable you to carry out central backups and migrations. It will enable you to use the resources on that box (i.e. RAM) to cache the most frequently used parts of the database, whichever account they happen to be in.

The RAM used to cache a large MySQL database (e.g. Innodb buffer pool) can be reused between requests and is shared between all of the data (e.g. tables, rows, columns) in it. A sqlite database reads the data from disc each time it's needed, except inside a single session.


My suggestions:

  • Consider the above points, ignore them if you like
  • MEASURE the performance of your application with a high simulated load on production-grade hardware. Make sure you use production grade hardware for your database server (e.g. battery backed raid controller)
  • Compare it with a sqlite implementation, if you can.
MarkR
+1  A: 

You could look at CouchDB for massive concurrency.

nkassis
I'm not really looking for massive concurrency.
e-satis
+2  A: 

I came upon your post after the fact. I run a mediawiki hosting farm where each wiki has its own sqlite db. The main problem I see is that each database is being loaded independently and causes some memory hit. Besides that it works well for the backup processing and allows each wiki to have its own database. Since mediawiki wasn't set up to have multiple wikis, it meant that in mysql each wiki had to have its own set of tables with their own table prefixes. This really didn't work out, and sqlite was a great alternative.

If I were building a system from the ground up, I would have used mysql. Since I am using a prebuilt system and each user has their own silo, sqlite is working just fine.

RobKohr