views:

393

answers:

6

Hi

My team is developing a large music portal in PHP. It is hoped that the portal will have 1 million+ users within a year of its launch. The portal will allow users to create playlists, stream and download music.

Till now, we have developed applications that have been used by a maximum of about a 1000 simultaneous users. This portal is of a different order of magnitude.

I want to know if there are any benchmarks available for calculating the hardware memory requirements and bandwidth requirements for such large projects.

Also, if a content delivery network (CDN) can handle all traffic related problems or something specific - like caching - needs to be done.

Which database would be suitable? Will mySQL be able to handle such loads or something else is required.

Thanks

Vinayak

+1  A: 

All applications respond differently to large load so there isn't any straight answer. Music streaming should hit mainly bandwidth issues however disk IO may also come into it. Having large amounts of memory available on the content servers can overcome this.

MySQL can handle huge amounts of load if scaled properly. Wikipedia use MySQL and handle a serious number of hits. Facebook are another big hitter using MySQL.

Edit: http://highscalability.com/ is a great resource to see what others have done.

Ryaner
+1  A: 

mySQL will scale to this level, but the game changes significantly from 1000 users to 1 million users. Have you thought of using AWS to be able to spin up additional hardware as needed? See a PHP walkthrough for AWS here to get you started if you think the cloud is where you need to be, and I think perhaps so.

Chris Ballance
+4  A: 

Until you actually have that number of users, you shouldn't be too worried about it. One of the number one rules of programming is to not optimize until you actually have performance problems, and even then, don't optimize until you have info on where the performance problems are. Right now, you don't have either.

With that said... yes, MySQL can be made to scale. Yes, you might need to do caching. Yes, a CDN might be helpful.

Start with a single server, and if you need to move beyond that, profile to find out where your bottlenecks are and go from there. Get someone who knows what they're doing to help you if you can.

Keith Palmer
+1  A: 

As for this music portal is concerned to start off with have a "Virtual Private Server" so that we will have a control of the streaming , buffering methods , conversion of the file formats also we would have a full control to install open source libraries to manage the scalability and performance.

As you aware the famous Youtube are using MySQL for storing and serving the data to large number of users. For more info regarding the scalability check out this link : High Scalability Youtube Architechture

As your target audience grows larger can migrate to database clustering and caching the contents.

Webrsk
+1  A: 

Scalablity happens, You will overcomplicate the design if you're going for the 1M users from the start.

But it will help to keep scalablity in mind. That said, here are some rules of thumb:

Keep pages stateless
Pages that rely on $_SESSION/logindata or database content, require parsing etc.
You could generate static html pages based on a content-change-event.

N Machines
For webservers that means don't just upload user content to 1 machine but distibute it to a cluster. For databases use 1 Database server for writing data and N servers for reading (outdated) data.

Caching
Generating and collecting data from Cache also takes time. Only cache operations which are slower.

Bob Fanger
+1  A: 

Building scalable systems is a degree of future prediction that it's almost impossible to get correct, and carefully building everything to scale forever results - usually - in over-architecture.

Instead, I'd say build with the next few stages of expanding in mind, and work from there. So, for example, if you're building a high-content site, bandwidth and storage are likely to be a low-level chokepoint, so make sure that all your content URLs are generated so that if you switch to a CDN, you don't have to recode a lot of the site.

One of the things I would recommend is to start off with every database query flagged somehow as to whether it just reads or needs write access as well, as this will make splitting off into a replicating database model a lot easier down the road.

Aquarion