views:

55

answers:

1

I'm currently building a site that gives a lot of sorting options to the user and I want to build it in a way that it can be scaled without too much headache. Of course there are tradeoffs to both of these techniques, but id like to hear your opinions.

1) Store a serialized json array in a single column. When a new entry is added or removed, the json is decoded, the array is manipulated then re-encoded and updated in the DB. The data would be sorted on PHP's site using array sort functions, or in some cases MySQL's "IN" would be used to select entries based on a list of ids.

The main issue with this approach is increased development time, and the risk of coding myself into a corner. If the json string ever needed to change, or I want to add a new feature, it might be a complete pain. I also dont know how this will perform under load, always selecting out and updating a large json string for each user.

2) The classic RDBMS method of doing an insert for each new entry and their relation to the user/entry. Then select out using JOIN. Indexes would be carefully set up, and EXPLAIN would be used to make sure each JOIN select is optomized.

There has been a lot of talk about moving away from RDBMS. But that talk usually comes from the sites that are getting millions of users. The nice thing about this is that development will be quick and if new data needs to be added in the future, it's easy to alter a table.

Should I even be worrying about scale when first coding my application? Or should I focus on the product, release early, and tweak for scale as I go?

Thanks, and I look forward to your opinions!

+2  A: 

I think you shouldn't worry about that yet, if you don't know for what scale you will have to optimize your application anyway.

Solution 1) sounds not very nice. If you want to go with something like that you should really go with a non relational database like CouchDB (I just found a nice tutorial for it today) since it stores JSON right away (and you can sort and select it with Views defined in JavaScript). It is not only for sites that have millions of users (though it indeed scales very easily). You should just try it yourself not regarding all the "buzz" and anti-"buzz" around it and just see if it might be useful for your application or not.

And maybe you should just go with the RDBMS. They are still extremely fast (well maybe you will get in trouble if you have like Facebook 50TB of Inbox data to organize and search) and you will be surprised what a properly defined index can do for the performance. And there is a lot of RDBMS knowledge and good tools so that it is quite easy to use.

In a well designed application you should imho easily be able to switch the underlying database implementation anyway.

Daff