views:

155

answers:

5

Hi, I have a MySql table called today_stats. It has got Id, date and clicks. I'm trying to create a script to get the values and try to predict the next 7 days clicks. How I can predict it in PHP?

A: 

Start off by connecting to the database and then retrieving the data for x days previously. Then you could attempt to make a line of best fit for the previous days and then just use that and extend into the future. But depending on the application, a line of best fit isn't going to be good enough.

Kurru
A: 

a simple approach would be to group by days and average each value. This can all be done in SQL

used2could
+1  A: 

This has less to do with PHP, and more to do with math. The simplest way to calculate something like this is to take the average traffic for a given day over the past X weeks. You don't want to pull all the data, because fads and page content changes.

So, for example, get the average traffic for each day over the last month. You'll be able to tell how accurate your estimates are by comparing them to actual traffic. If they aren't accurate at all, then try playing with the calculation (ex., change the time period you're sampling from). Or maybe it's a good thing that your estimate is off: your site was just featured on the front page of the New York Times!

Cheers.

Sam Bisbee
So, I have to get the average, isn't it. But if's possible I want to represent the graph in Excel. How can I get the value, and then what I have to do? I don't know much of Maths.
Francesc
The average is all of the values added together, divided by the number of values you're average. For example, if your data set is 6, 7, and 8, you would do: (6 + 7 + 8) / 3. If all of your values are in an array, add all of the items in the array together and divide by the array's length (sizeof($array)).
Sam Bisbee
+2  A: 

Different types of curve fitting described here: http://en.wikipedia.org/wiki/Curve_fitting

Also: http://www.qub.buffalo.edu/wiki/index.php/Curve_Fitting

zaf
This answer is as close as you can get without some underlying notion of what sort of trend you expect to see. You will not be able to fit any function to your data without knowing what family of functions are candidates. See this related question if a logarithmic function might be satisfactory for you: http://stackoverflow.com/questions/2768885/how-can-i-calculate-a-trend-line-in-php
Geoff
A: 

The algorithm you are looking for is called Least Squares

What you need to do is minimize the summed up distances from each point to the function you will use to predict the future values. For the distance to be always positive, not the absolute value is taken into calculation, but the square of the value. The sum of the squares of the differences has to be minimum. By defining the function that makes up that sum, deriving it, solving the resulting equation, you will find the parameters for your function, that will be CLOSEST to the statistical values from the past.

Programs like Excel (maybe OpenOffice Spreadsheet too) have a built-in function that does this for you, using polynomial functions to define the dependence.

Basically you should take Time as the independent value, and all the others as described values.

This is called econometrics, because its widespread in economics. This way, if you have a lot of statistical data from the past, the prediction for the next day will be quite accurate (you will also be able to determine the trust interval - the possible error that may occur). The following days will be less and less accurate.

If you make different models for each day of week, include holidays and special days as variables, you will get a much higher precision.

This is the only RIGHT way to mathematically forecast future values. But from all this a question arises: Is it really worth it?

Alexander
How is called that excel function?
Francesc
http://www.pge.com/includes/docs/pdfs/about/edusafety/training/pec/toolbox/tll/appnotes/regression_using_the_excel_linest_function.pdfit's LINEST
Alexander