views:

1054

answers:

6

What I want to do is take a certain stock pattern (defined as a series of x and y coordinates) and compare it against historical stock prices. If I find anything in the historical prices similar to that pattern I defined, I would like to return it as a match.

I'm not sure how to determine how similar two curved lines are. I did some research, and you can find the similarity of two straight lines (with linear regression), but I haven't yet come across a good way to compare two curved lines.

My best approach right now is get several high and low points from the historical data range I'm looking at, find the slopes of the lines, and compare those to the slopes of the pattern I'm trying to match to see if they're roughly the same.

Any better ideas? I'd love to hear them!

Edit: Thanks for the input! I considered the least squares approach before, but I wasn't sure where to go with it. After the input I received though, I think computing the least squares of each line first to smooth out the data a little bit, then scaling and stretching the pattern like James suggested should get me what I'm looking for.

I plan on using this to identify certain technical flags in the stock to determine buy and sell signals. There are already sites out there that do this to some degree (such as stockfetcher), but of course I'd like to try it myself and see if I can do any better.

A: 

One thought might be to take moving averages of varying time ranges (daily for weeks, months, years; weekly for months, years; etc) and compare them to moving averages now.

The individual averages would also give you an easier comparison.. if consecutive items in the averages are in some normalized form (say from 0..1 to account for splits, etc), you can compare consecutive elements in the vector to each other inside some range epsilon, and get a potential of a match.

Just a thought.

Mathworld (http://mathworld.wolfram.com/) should also have some take on curve comparisons.

warren
A: 

One of the problems is that curve fitting using non-linear functions is not always going to work for some of your patterns depending how complex they are. You could use quadratic or cubic or some other order of polynomials to get a more accurate result but it's not going to work in all situations, particularly with any sharp changes in the data over time.

Honestly I think a reasonable and relatively simple solution is to 'scale' and 'stretch' your pattern so that it occurs over the same range as the historical data. You can use interpolation for the x axis and multiplication plus an offset for the y-axis. After that just look at the mean of the squared differences at each point and if that is lower than a threshold value then you can consider it a match. It will require a bit of tweaking to achieve predictable results but I think it's a nice approach that should allow you to define any sort of pattern without relying on regression producing a nicely fitted curve. Essentially it's just an application of statistics. You could also look at standard deviations or variance for a more comprehensive approach.

A: 

or perhaps look at the derivatives?

stock price movement in theory is usually modeled as brownian motion with a drift factor. (i know very little, but take a look here)

if you don't mind me asking, to what end might that be?

profit, of course! past actions are indicative of future behavior! :P
Jimmy
does brownian motion have anything to do with brown notes? :)
FryGuy
+2  A: 

Compute the total least squares of the residuals (y differences) on each point. This should give you a measure of the geometric fit (how similar they look). You should then be able to set some tolerance for 'similar enough'.

See http://en.wikipedia.org/wiki/Curve_fitting

frankodwyer
+1  A: 

Math is not my strong point, however you might be able to use Correlation.

Calculate the correlation value between the two data-sets and and if the correlation is greater than some value (.8?), then consider the sets similar enough.

monkut
A: 

Least squares wouldn't be the best you could do on it. Use the RANSAC algorithm. It will handle this kind of data, because this kind of data is very unpredictable and is often noisy.

monksy