views:

292

answers:

4

My team and I are in the middle of developing an application which needs to be able to handle pretty heavy traffic. Not facebook level but in the future I would like to be able to scale to that without massive code re-writes.

My thought was to modularise out everything into seperate services with their own interfaces. So for example messaging would have a messaging interface that might have send and getMessages() as methods and then the PHP web app would simply query this interface through soap or curl or something like that. The messaging application could then be any kind of application so a Java application or Python or whatever was suitable for that particular functionality with its own seperate database shard.

Is this a good approach?

+5  A: 

That sounds reasonable as a first step, just keep in mind the traffic between the PHP layer and the messaging layer will add a bit of latency. You might also consider:

  • Caching data on the PHP layer, using (for example) memcached. You might also consider using a Web Proxy Cache such as squid

  • Scaling your web server to more than one machine by, for example, storing session data in the database. Once you can support having 2 web servers, adding a third (fourth, fifth, etc) should be simple. Keep in mind that you may eventually need to scale the messaging layer to multiple machines as well.

  • Using tools such as PHP e-Accelerator to cache compiled scripts; should help increase performance on the web layer

There are some great articles on High Scalability as well, that you might find helpful.

Finally, keep in mind it is easy to over-engineer a solution. Your best bet is to continuously measure load, performance, resource utilization, etc along the way - then use this data to make adjustments as necessary.

Justin Ethier
A: 

Cache, cache, and more cache. SQL query caching, opcode caching, avoid querying multiple times for the same thing. Then use a profiler as you run to keep track of where your slow points are.

pocketfullofcheese
+4  A: 

Modularise

My thought was to modularise out everything into seperate services with their own interfaces. So for example messaging would have a messaging interface that might have send and getMessages() as methods and then the PHP web app would simply query this interface through soap or curl or something like that

I like the idea of separating every in service modules(good coding principle). I don't like the part about SOAP :(. I think it is way to complex. I would go for something like JSON-RPC or something.

Some quick tips:

My team and I are in the middle of developing an application which needs to be able to handle pretty heavy traffic. Not facebook level but in the future I would like to be able to scale to that without massive code re-writes.

  • Like the others also hinted I would advice you to look at High Scalability blog.
  • First focus on the front-end using YSlow / google page speed. This optimization are easy to implement and can give you significantly boosts. A quote from the Yslow webpage:

80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages.

  • I would also advice you to have a look at HipHop for php which converts your php code to C code which was a huge boost for facebook. A quote from the article:

With HipHop we've reduced the CPU usage on our Web servers on average by about fifty percent, depending on the page. Less CPU means fewer servers, which means less overhead

  • I guess another big/easy improvement if not already setup is to use APC(opcode cache) to cache your compiled code. This will give you a huge boost(not necessary for the parts converted to HipHop).
  • If you want your websites to scale you have to go by the mantra:

    RAM is the new Disk

    !Cache, cache, cache! with for example APC, memcached, redis.

  • First profile your PHP code then optimize low hanging fruit. I found this audio file from Rasmus Lerdorf really useful. When reading the blog post you will find a lot of good tips to improve performance.
  • Also I would consider moving away from the relation database in favor of for example Cassandra. This is a move which I see a lot of big players do recently(for example twitter, digg, facebook, reddit). You will have to go in a complete different mindset this way, but my bet is this will totally be worth the effort.
  • Queue everything and delight every one with for example beanstalkd, gearman or google app engine's taskqueue.
Alfred
A: 

Basing the high level design around a set of modules is a good way to manage complexity and structure development (even more so that at the micro level) however

the PHP web app would simply query this interface through soap or curl

This introduces a lot of latency into the application. I'd suggest defining APIs but for any synchronously handled request, run as much of the code within a single thread as possible.

Sure, if you have to deal with multiple development languages, using an interface running over HTTP is a very pragmatic solution - but if you're developing the front end in PHP then by programming to an abstract PHP API (which may call Soap, Corba, or other stuff), you still have the option of reimplementing the backend in a different way later.

I'm not sure what you mean by messaging. If you're talking about asynchronous request processing, then you need to think about how to implement a subscriber in PHP. This is a complete can of worms - I've not seen a good message handling system written in PHP - but I've not seen a good scalable solution written in Java either - and that includes the products pimped by some major players in high end systems. Maybe one day I'll write one ;) in the meantime, you really want to keep your complex (and potentially less reliable) business logic running in a seperate thread from any sort of subscriber daemon - so an obvious way to do that is to expose the target as a web page and have the subscriber running as a daemon which simply picks up messages and calls web-based APIs.

You really don't want to base a synchronous system around messaging if your at all concerned about performance / reliability / scalability.

HTH

C.

symcbean