views:

71

answers:

2

Hi; Is there any difference between CMS and hight traffic websites (like news portals) in logic and database design and optimization (PHP and MySQL)? I have searched for php site scalability in stackoverflow and memcached is in a majority. Is there techniques for MySQL optimization? (Im looking for a book for this issue. I have searched in amazon but I dont know what is the best choise.) Thanks in advance

+1  A: 

Sure, there are all sorts of things you can do to optimize your PHP/MySQL web applications for high traffic websites. However, most of them depend on your specific situation, which you haven't given in your question.

Your database should be well structured regardless of whether you have a high-traffic site or not. If you use an off-the-shelf CMS, this is typically fine. Aside from good application architecture, there is no one-size-fits-all solution.

Brad
+3  A: 

this isnt so easy to answer. there are different approaches and a variety of opinions but ill try to cover some common scenarios. but first some basics.

most web applications can be sperated in application and database. database usage can be seperated into transactional (oltp) and analytical (olap)

in the best case you can just start a number of application servers and distribute traffic among them. they all have a connection to the same database server and can work independently. this can be however difficult if you have other shared data, sessions etc. you can accomplish this by simply adding multiple ip adresses to your domain namen in dns. or you use load balancing techniques to forward the clients do different servers.

application scaling is generally very easy. database is much more complex.

the first thing to do is usually set up one or more replication servers which have the same data as the main database. they can be cascaded but have 1 serous disadvantage. their data is not always up to date. in general not more than some seconds old but it can be more under load. but for many use cases this is fine. big sites that just display information could just replicate their database to some slave servers, set up some application servers (its a good practice to run one slave and one application server on the same server and let this application server access this database slave) and every is fine.

every olap query can be directed to a slave. olap querys are those that dont modify anything and dont need 100% up 2 date data.

so everything needs to be written to the very same database source server from which every other server gets its copy. for example every comment for an article.

if this bottleneck gets too tight you can go in two dirctions.

  1. sharding
  2. master-master replication

sharding means you decide on the application server where to store and where to fetch your data. for example every comment that starts with a gets to server a, b-> b and so on. thats a stupid example but its basically how it is. mostly some internal ids are involved. if possible its good to shard data so that it can be completely pulled from that server agani. in the example above, if i wanted to have all comments for an article i would have to ask eveyr server a-z and merge the results. this is inefficitient but possible, because those servers can be replicated. this is called mapping (you could check the famous google map-reduce algorithm whcih basically does just this).

master-master repliation means that you write your data to different master servers and they synchronize each other, and isnt stored seperately like if you do sharding. this has to be done if your application is not able to decide on its own where to store and fetch data. you just store to any master server, every server gets everything and everybody is happy? no... because this involves another serious problem. conflicts! imagine two users enter a comment. commentA gets stored on serverA, commentB gets stored on serverB. which id should we use. which one comes first? the best is to design an application that avoids this cases and has different keys and stuff. but what usually happens is conflict resolving, prioritizing and stuff. oracle has alot of features on this level and mysql is still behind. but trends are going into much more complex data structes like clouds anaway...

well i dont think i explained well but you should at least get some keywords from the text that oyu can investigate further.

Joe Hopfgartner
@Joe Hopfgartner, Thanks alot for great answer. Can you please suggest a resource for this issue?
phpExe
sure my explanation especially when it comes to master master replication is very poor, i will fetch some good resources and add them
Joe Hopfgartner