I have a web app that stores a large amount of text data. The db is currently increasing by 1GB a week. I expect this to grow exponentially as we get more customers, so 1GB this week, 2GB next week, 4GB the following week, then 8GB, etc...
Right now this data is stored in a single MS SQL 2008 database that 10GB in size. Performance is great right now, no issues so far.
But, I am worried about what will happen in a few months as the DB keeps growing. I want to ensure that we are able to scale and performance is not affected.
Also, we need to figure out a good backup strategy for the DB that is not too expensive.
I'm considering moving the storage over to Amazon's Simple DB or moving our web app over to Azure and using Azure Tables to store this data.
The pros with Azure is that backups would be taken care of automatically (both for Azure Tables and the Azure SQL db). The cons is the cost and the fact that several parts of the app would need to be re-architected to run on Azure and use Azure Tables.
The pros with Simple DB is that we are currently on EC2 and can stay there and less of the app would need to be rewritten to use SimpleDB instead of SQL Server. Cons: we still need an effective backup strategy for the SQL Server.
We could also just leave app as it is right now in an MS SQL 2008 database (I'm just not sure how large of a DB SQL Server can handle - max case studies I've seen are 1TB or so); but again we would need an effective backup and recovery strategy for a DB that is pretty large. But the benefit is that we can run relational queries on the data, so there is a slight advantage in having the data in SQL server.
I'm wondering what the best solution is? And how other companies scale DBs that are this big and grow at this rate. As well as what backup and recovery options are the best?
Any advice or exprience you can share with Azure Tables, SimpleDB, or large SQL Server DBs would be great as well!