views:

157

answers:

3

I have a process that consumes multiple sources of live price data from the forex market and produces 2 streams of time series data as its output. The output is noisy (i.e. not smooth like sin or cos), and both streams are bound between the values of 0 and 100.

Is there an approach in machine learning or AI that can help me identify when 1 signal is steeply positive and one is steeply negative? I have toyed around with simple moving averages and exponential moving averages to smooth out the lines a little, but I lose too much information that way.

+1  A: 

You could apply superwised machine learning.

Features: the values of the last n data points. Labels: -1 descreasing +1 increasing

Now you only need some labeled samples This will give a table like this (here n=5):

# t-4, t-3, t-2, t-1, t, label
#-----------------------------
54, 43, 98, 1, 45, +1
21, 12,  5, 98, 4, +1
 6, 78, 45, 65, 37 -1
...

now take some ready-to-use machine-learning library (like WEKA) and train a classifer on this. Depending on the algorithm you'll also get a measurement for the certainty of the result. How many labeled training samples you'll need is difficult to predict. Try 100 for a start, but it could also be more than 1000. There is also a lot of potential in the features. Maybe it works better if you don't use absolute values but differences.

The labeled training data can be labeled by hand. But for market prediction its often possible to do this automatically based on historical data.

There should be lots of literature about stock prediction. People have been (and still are) researching in this area for ages. The above is of course very primitve.


Another approch would be to fit curve though your data and take the derivative of it. The more you know about the data the stronger this approch will be. For example if you have a good idea of the probability model that produces the noise, you can derive the 'optimal' fit (max likelyhood and such). If you know something about your underlying signal (the one you try to measure) this helps too (is it linear? quadratic?, libschiz cont.?. bounded?, ...).

This approch requires problem specific knowledge that might not be available and a good deal of math. But it can be very awarding since you don't end up with a black-box like you would with machine learning, but with a matematical model that you understand and can analyze.

Lawnmower
+1  A: 

Lawnmower's suggestion sounds nice, but there are several additional points you should consider:

  • First, as far as I understand the question, there are more than two possible labels:

    • signal 1 up; signal 2 down
    • signal 2 up; signal 1 down
    • etc
  • Second, manually marking the data will be very tedious, since most probably you will need a lot of data to train, test and validate your models. Mechanical turk can help you with this task

  • Even if you take Lawnmower's advice, I would smooth the data a little bit before the training

  • Plot every input variable against another and against the output to get some idea on how the variables affect the outcome. If pair-wise plotting isn't practically possible, try PCA or another dimension reduction technique

bgbg
I agree. I'm not sure if this mechanical turk thing is required, seems to me that all that's needed is some historical data of the signals. What I'm describing in my answer is just the most simple method I could think of. Simultaneously learning multiple signals (multi-task learning), data preprocessing, other features are all promising improvements. But it doesn't end there. One could write a book full of possible improvements (many have). That's why I kept it as a simple introduction. For more, read the literature.
Lawnmower
+1  A: 

How about determining when one is steeply positive and the other is steeply negative on the raw data. Then only flag the result as significant if it persists in that state long enough or for a large enough fraction of some small time period.

Mick
that's the trick now isn't it? how exactly do you define "steeply negative"? its simple enough for a human to look at a noisy signal and say "up" or "down", but teaching a computer to do that is a bit more difficult. if you pick the wrong 2 points on an uptrend, your algorithm might actually interpret the situation as a down trend. this is not good when you have money on the line. this same case happens if you attempt to smooth the line out to make your job easier.
bostonBob