Hi, I have a MySql table called today_stats. It has got Id, date and clicks. I'm trying to create a script to get the values and try to predict the next 7 days clicks. How I can predict it in PHP?
Start off by connecting to the database and then retrieving the data for x days previously. Then you could attempt to make a line of best fit for the previous days and then just use that and extend into the future. But depending on the application, a line of best fit isn't going to be good enough.
a simple approach would be to group by days and average each value. This can all be done in SQL
This has less to do with PHP, and more to do with math. The simplest way to calculate something like this is to take the average traffic for a given day over the past X weeks. You don't want to pull all the data, because fads and page content changes.
So, for example, get the average traffic for each day over the last month. You'll be able to tell how accurate your estimates are by comparing them to actual traffic. If they aren't accurate at all, then try playing with the calculation (ex., change the time period you're sampling from). Or maybe it's a good thing that your estimate is off: your site was just featured on the front page of the New York Times!
Cheers.
Different types of curve fitting described here: http://en.wikipedia.org/wiki/Curve_fitting
Also: http://www.qub.buffalo.edu/wiki/index.php/Curve_Fitting
The algorithm you are looking for is called Least Squares
What you need to do is minimize the summed up distances from each point to the function you will use to predict the future values. For the distance to be always positive, not the absolute value is taken into calculation, but the square of the value. The sum of the squares of the differences has to be minimum. By defining the function that makes up that sum, deriving it, solving the resulting equation, you will find the parameters for your function, that will be CLOSEST to the statistical values from the past.
Programs like Excel (maybe OpenOffice Spreadsheet too) have a built-in function that does this for you, using polynomial functions to define the dependence.
Basically you should take Time as the independent value, and all the others as described values.
This is called econometrics, because its widespread in economics. This way, if you have a lot of statistical data from the past, the prediction for the next day will be quite accurate (you will also be able to determine the trust interval - the possible error that may occur). The following days will be less and less accurate.
If you make different models for each day of week, include holidays and special days as variables, you will get a much higher precision.
This is the only RIGHT way to mathematically forecast future values. But from all this a question arises: Is it really worth it?