views:

717

answers:

6

I would like to know how to benchmark a php/mysql site.

We have a web app almost completed and ready to go live, we know how many people are going to be using it in a years time but have absolutely no idea how much bandwidth the average user takes, to how much time they burn up on the database etc. We need to determine the correct servers to purchase.

Is there something server side linux that can monitor these statistics per user? So that we can then take this data and extrapolate it?

If I am going about this completely wrong please let me know but I believe that this is a frequent activity for new web apps.

EDIT: I may have asked for the incorrect information. We can see how long the database queries take and how long it takes to load the page but have no idea what load is placed on the server. The question I am asking is can we handle 100 users at once on average...1000? What type of server requirements are needed to hit 1M users. Etc.

Thanks for your help.

A: 

I dont have any experience with benchmarking tools but in some cases I create a simple table with the fields id, ipaddress, parsetime, queries. Just insert a new row each time a page refreshes or gets called (in ajax situations). Then analyze the data collected in a week/month/quarter/year. Its not your preferred situation but a simple way to get some statistics on short notice.

Some results on PHP benchmarks: http://www.google.nl/search?hl=nl&source=hp&q=php+bencmark&meta=&aq=f&oq=

Ben Fransen
Yeah I guess I'm looking for something more 'outside' of the app to track what its doing.
I've updates my answer and provided a link, hope it's something that helps you further.
Ben Fransen
Why do have a downvote? Could the one giving it explain whats wrong with this answer?
Ben Fransen
+1  A: 

Unless you are using a heavyweight framework or something like that, the DB queries are likely to be the slowest part of your app.

What I do for monitoring is measure execution time for every query in my DB abstraction object. Then, for every query that takes longer than X milliseconds (fill in your own X), I write a line to my query log file that identifies the PHP script file and line number on which the query appeared (use debug_backtrace() to find that information) together with other relevant context data (e.g. user identity, date-time etc.).

This log file can be statistically analyzed later for various info.

For example, you can find which of your queries are taking the greatest total time (relevant to server load). Or which are the slowest (relevant to user experience). Or which user is loading the system most (possibly abuse or robots).

I also plot Pareto charts to identify where best to spend my query optimization efforts.

fsb
Nice answer! I'm gonna implement that for some larger projects I'm working on!
Ben Fransen
Could you briefly explain how you use a Pareto chart to identify where time is best spent?
http://en.wikipedia.org/wiki/Pareto_chartMake a list of the total and average query time for each of your queries and sort the list by one or the other metric. Plot the chart as described in the Wikipedia link to visualize how much of your time is spent in the slowest N queries and which queries they are. Start optimization work at the top of the list.
fsb
+1  A: 

Most importanly, you need to define what you want the performance to be: You can always find areas to optimize. However, improving the response time from 750ms to 650ms may not be worth the time.

As fsb said, your bottlenecks will probably be your database queries. However, I would also stipulate that your bottlenecks are not always (or even likely) where you think they are. I would suggest reading this first, and do a global test of your site.

If it is your application, use xdebug to profile your PHP code. Then use WinCacheGrind or KCacheGrind to analyze the output. This may surprise you.

To address database issues, it is pretty database specific. For MySQL, I turn on the slow query log, log queries not using indices, enable query logging, and use toolkits like Maatkit to analyze the queries and find the bottlenecks.

Mike Crowe
+1  A: 

You can use ApacheBench tool (ab, usually a part of apache web-server package) for stress-testing(1k requests with 10 clients = ab -c 10 -n 1000 http://url) of script, that you suspect could be slow enough. It will show you the distribution of response times (in 90% cases request processed in less than 200msec).

Than you can also grab SQL-queries executed by that particular script and do "explain plan" for them to get a rough idea how will it degrade when there will be 10-100-10mln times more records in tables.

Regarding how many users can it serve - you can use your favourite browser and emulate a typical user visit, take access_log file and sum sent bytes (one of last numbers in log line). For example it was 5kb text/html+50kb png/jpg/etc.=55kb per user visit. Plus headers/etc let's say 60kb per visit*1m=60gb traffic per day. Is your bandwidth good enough for that? (60gb/86.4ksec=700kb/sec).

Pavel Reich
A: 

A tool that I find fairly useful is jmeter which allows (at it's most basic) you to set your browser to use jmeter as a proxy then you wander all around your website and it will record everything you do.

Once you are happy that it's a decent test of most of your website you can then save the test in jmeter and tell it to run your test with a set number of threads and a number of loops per thread to simulate load on your website.

For example you can run 50 clients each running the testplan 10 times.

You can then ramp the numbers up and down to see the performance impact it has on the site, it graphs the response time for you.

This lets you tune different parameters, try different caching strategies and check the real world impact of those changes.

Matt Wheeler
+2  A: 

You can try using this http://code.google.com/p/dbench

SeniorDev