views:

164

answers:

7

Consider a sales department that sets a sales goal for each day. The total goal isn't important, but the overage or underage is. For example, if Monday of week 1 has a goal of 50 and we sell 60, that day gets a score of +10. On Tuesday, our goal is 48 and we sell 46 for a score of -2. At the end of the week, we score the week like this:

[0,0]=10,[0,1]=-2,[0,2]=1,[0,3]=7,[0,4]=6

In this example, both Monday (0,0) and Thursday and Friday (0,3 and 0,4) are "hot"

If we look at the results from week 2, we see:

[1,0]=-4,[1,1]=2,[1,2]=-1,[1,3]=4,[1,4]=5

For week 2, the end of the week is hot, and Tuesday is warm.

Next, if we compare weeks one and two, we see that the end of the week tends to be better than the first part of the week. So, now let's add weeks 3 and 4:

[0,0]=10,[0,1]=-2,[0,2]=1,[0,3]=7,[0,4]=6
[1,0]=-4,[1,1]=2,[1,2]=-1,[1,3]=4,[1,4]=5
[2,0]=-8,[2,1]=-2,[2,2]=-1,[2,3]=2,[2,4]=3
[3,0]=2,[3,1]=3,[3,2]=4,[3,3]=7,[3,4]=9

From this, we see that the end of the week is better theory holds true. But we also see that end of the month is better than the start. Of course, we would want to next compare this month with next month, or compare a group of months for quarterly or annual results.

I'm not a math or stats guy, but I'm pretty sure there are algorithms designed for this type of problem. Since I don't have a math background (and don't remember any algebra from my earlier days), where would I look for help? Does this type of "hotspot" logic have a name? Are there formulas or algorithms that can slice and dice and compare multidimensional arrays?

Any help, pointers or advice is appreciated!

+2  A: 

This data isn't really multidimensional, it's just a simple time series, and there are many ways to analyse it. I'd suggest you start with the Fourier Transform, it detects "rhythms" in a series, so this data would show a spike at 7 days, and also around thirty, and if you extended the data set to a few years it would show a one-year spike for seasons and holidays. That should keep you busy for a while, until you're ready to use real multidimensional data, say by adding in weather information, stock market data, results of recent sports events and so on.

Beta
Ouch, faster by a few seconds.
fortran
I had the whole thing hotkeyed. I knew someday someone would ask a question whose answer was "the Fourier Transform". That frees up [f10], only two more to go.
Beta
xDDD I'd vote up you twice for that comment if I could :p
fortran
+1  A: 

What you want to do is quite simple - you just have to calculate the autocorrelation of your data and look at the correlogram. From the correlogram you can see 'hidden' periods of your data and then you can use this information to analyze the periods.

Here is the result - your numbers and their normalized autocorrelation.

10    1,000
-2    0,097
 1   -0,121
 7    0,084
 6    0,098
-4    0,154
 2   -0,082
-1   -0,550
 4   -0,341
 5   -0,027
-8   -0,165
-2   -0,212
-1   -0,555
 2   -0,426
 3   -0,279
 2    0,195
 3    0,000
 4   -0,795
 7   -1,000
 9  

I used Excel to get the values. But the sequence in column A and add the equation =CORREL($A$1:$A$20;$A1:$A20) to cell B1 and copy it then up to B19. If you the add a line diagram, you can nicely see the structure of the data.

Daniel Brückner
+1  A: 

It seems to me that an OLAP approach (like pivot tables in MS Excel) fit the problem perfectly.

Amro
I think this is the most useful suggestion for someone who's not trying to hack this data in R or something and get complex forecasting or statistical significance results. Pivot tables in Excel are incredibly useful, not that hard to set up or use, and you're almost certain to already have the appropriate tool.
Harlan
A: 

You can already make reasonable guesses about the periods of patterns - you're looking at things like weekly and monthly. To look for weekly patterns, for example, just average all the mondays together and so on. Same goes for days of the month, for months of the year.

Sure, you could use a complex algorithm to find out that there's a weekly pattern, but you already know to expect that. If you think there really may be patterns buried there that you'd never suspect (there's a strange community of people who use a 5-day week and frequent your business), by all means, use a strong tool -- but if you know what kinds of things to look for, there's really no need.

Jefromi
+2  A: 

The following might be relevant to you: Stochastic oscillators in technical analysis, which are used to determine whether a stock has been overbought or oversold.

I'm oversimplifying here, but essentially you have two moving calculations:

  • 14-day stochastic: 100 * (today's closing price - low of last 14 days) / (high of last 14 days - low of last 14 days)
  • 3-day stochastic: same calculation, but relative to 3 days.

The 14-day and 3-day stochastics will have a tendency to follow the same curve. Your stochastics will fall somewhere between 1.0 and 0.0; stochastics above 0.8 are considered overbought or bearish, below 0.2 indicates oversold or bullish. More specifically, when your 3-day stochastic "crosses" the 14-day stochastic in one of those regions, you have predictor of momentum of the prices.

Although some people consider technical analysis to be voodoo, empirical evidence indicates that it has some predictive power. For what its worth, a stochastic is a very easy and efficient way to visualize the momentum of prices over time.

Juliet
A: 

Daniel has the right idea when he suggested correlation but I don't think autocorrelation is what you want. Instead I would suggest correlating each week with each other week. Peaks in your correlation--that is values close to 1--suggest that the values of the weeks resemble each other (I.e. are peiodic) for that particular shift.

For example when you cross correlate

0 0 1 2 0 0

with

0 0 0 1 1 0

the result would be

 2 0 0 1 3 0

the highest value is 3, which corresponds to shifting (right) the second array by 4

0 0 0 1 1 0 -->  0 0 1 1 0 0

and thenn multiplying component wise

0   0   1   2   0   0
0   0   1   1   0   0
----------------------
0 + 0 + 1 + 2 + 0 + 0 = 3

Note that when you correlate you can create your own "fake" week and cross-correlate all your real weeks, the idea being that you are looking for "shapes" of your weekly values that correspond to the shape of your fake week by looking for peaks in the correlation result.

So if you are interested in finding weeks that are close near the end of the week you could use the "fake" week

 -1 -1 -1 -1  1  1

and if you get a high response in the first value of the correlation this means that the real week that you correlated with has roughly this shape.

ldog
A: 

This is probably beyond the scope of what you're looking for, but one technical approach that would give you the ability to do forecasting, look at things like statistical significance, etc., would be ARIMA or similar Box-Jenkins models.

Harlan