tags:

views:

117

answers:

5

If you plot the numbers below, you get a "volatility smile": the numbers follow one linear slope (the left slope), and then change to following another linear slope (the right slope).

I have several sets of data like this and want to know where the slope changes. Notes:

  • The slope change usually occurs between points

  • I don't know how many points have the left slope and how many have the right slope.

  • There's no guarentee about the sign/magnitude of either slope or the relation between the slopes. Each slope may be negative or positive, and either might be greater than the other.

  • If the slopes are identical, the program should report this as a special case.

0.1613 
0.1596 
0.1579 
0.1561 
0.1544 
0.1528 
0.1511 
0.1495 
0.1478 
0.1462 
0.1446 
0.1431 
0.1415 
0.1416 
0.1418 
0.1419 
0.1421 
0.1422 
0.1424 
0.1425 
0.1426 
0.1428 
0.1429 
0.1431 
+4  A: 

Create a new list of numbers that is the difference of consecutive pairs in that list. These differences are the "slope" from one point to the next. For constant slopes, these numbers will all be the same. This changes the problem from detecting a slope change to detecting a level change.

phkahler
+3  A: 
Slope(X) = f(x) - f(x-1)
Slope2(x) = Slope(x) - Slope(x-1)

you need second one. It shows the speed of changing of the slope itself. (acceleration in terms of physics) I plotted both graphs in excel and check what i have:

alt text

you see peak of slope2? this is the indicator and it can easily be found.

Andrey
Do you actually want the 2 points swapped? Something like: Slope(x) = f(x) - f(x-1) and the same adjustment for the calculation of Slope2?You can get a sense of that from the graph; when the blue Slope graph goes up, the red Slope2 goes down.
Dan
@Dan you are 100% rigth, but for slope2 we get the same graph, so it is not very serious mistake :)
Andrey
A: 

You should define what identical slopes means, as the data is/becomes rounded/truncated. Is a degree of diference too much?

If there is only one real slope change an it can happen between points then there are three cases cases:

  1. The slope significantly changes two consecutive times. For example, if the data is 6, 5, 4, 3, 3, 4, 5, 6, 7, the slopes become -1, -1, -1, 0, 1, 1, 1, 1. This means that the real slope change is between the points of the intermediate slope, the zero in the example, or between the 3's. To get the point on where the slope really changed you need to get the point on where the line defined by the last two points (4, 3 in the example) intersect the line defined by the next two points (3, 4 in the example). The example gives the solution [4.5, 2.5].

  2. The slope significantly changes one time only. For example, if the data is 7, 6, 5, 4, 3, 4, 5, 6, 7, the slopes become -1, -1, -1, -1, 1, 1, 1, 1. This means that the real slope change is in the point coomon to the both slopes (3 in the example).

  3. The slope does not significantly change. This is the special case you mentioned.

Wilhelm
A: 

The derivative strikes again. In case you or anyone else did not realize this.

A basic text on calculus should help you analyze your functions.

Derivatives yield the slope of the tangent line of a function.

Integrals yield the area under a line.

As implied by a previous answer, if f(x) is the "Position" the derivative of f [f'(x)] is the velocity, the derivative of that [f''(x)] is the acceleration.

You can also work backwards from data representing f''(x) and compute velocity and position with respect to time.

BitShifter
A: 

There are two parts to the problem:

  1. Partition the data points into two sets; those that lie on the left line, and those that lie on the right.
  2. Fit a line through each of the two sets.

The second problem is easy and has a standard solution: fit a line using linear least squares.

How to solve the first problem is going to depend on the specifics of your application. Here's one very simple algorithm that will work well provided the number of data points, n, isn't too large: just do linear least squares on the first i points and the last n-i points, for i from 2 to n-2, and keep the one with least sum of squared residuals.

If n is very large, and the above approach is inefficient, you'll have to look at discrete second derivatives, as suggested by other posted answers. Note that unlike least-squares fits, approximations of derivatives are extremely sensitive to even small amounts of noise.

Once you have your two fit lines, you can use some heuristic (ie, slopes differ by less than some tolerance) to determine if you have the single-line special case.