views:

1938

answers:

15

I'm building a web semantic project that gathers the maximum ammount of historic data about a certain company and tries to predict its future market stock values.

For data I have the historic stock values (not normalized), news (0 to 1 polarity) and subjective content (also a 0 to 1 polarity).

What is the best AI system to train and use for this kind of objective? Is a simple NN with back-propagation training the best I can hope for?

update: Everyone is concerned about the quality of this system. Although I'm pretty sure the system is as good as a random prediction (or even worse), this is a school project around artificial intelligence and web semantics. Therefore I'm only concerned in picking the best kind of train method for the data I have (NN, RBF, SVM, Bayes, neuro-fuzzy, etc). Its not about making money.

+1  A: 

Using the current price as a prediction of future price is probably more accurate than any fancy system. That's the underlying assumption behind the Black-Scholes option pricing model, for example.

If you're just looking to play around, you might take a look at SQL Server Data Mining; it has some cool features for predicting against time sequences. It uses a hybrid of decision trees and autoregression ("AutoRegresion Tree").

RickNZ
Algorithm quality is generally measured by variance of predicted vs. actual. Even though this is a school project, if future values are random, then one AI algorithm is really just as good as any other--and none are better than a guess or a constant value.
RickNZ
+5  A: 

The problem is that you are trying to predict something where you don't have enough variables in your system.

You have a stock price that is without any context, so, did the stock go up or down because of a competitor? Perhaps there is a new competitor that stumbled, so the stock went up, but, should the competitor get their act together it could severely depress your company's stock.

If you company is a company that does outsourcing. If you don't take into account how the market rules can change then your prediction is going to be off, as, if companies have to pay extra taxes for outsourcing, then that will see a shift in resources.

Then you have weather and natural disaster events that can cause the stock to change drastically.

What you may want to do is to create a simulator, and the more variables you can include in your simulator the better off you are.

For example, what are the chances that an NFL strike can happen, as, you may find that you sell products to companies that sell to NFL teams, then that may impact your sales, so stock price.

You can model with a neural network, and it can come up with some way to accurately predict the past stock valuations, given a point in time, but it will not be any more accurate for future prices than a random walk would do, as it is a guess.

A simulator will give a range, given if certain conditions are met, then it's predictions may be closer, IMO.

UPDATE:

I don't believe a NN would be a good choice, since there is no way to test, after training, to see if the results are correct, unless you train up until June 2009, then pick values after that to see how well it did.

Using fuzzy logic may be your best bet, as it seems to deal with unknowns, but, you will probably want to get a range for the possible stock price.

If you are using web semantics, you may want to use some data mining, and see if you can determine what events may be the main predictor of a stock price change, then a neural network may be more useful.

James Black
Yes. Adding more variables would probably improve its prediciton quality, but this is not my main concern.
mrlinx
@mrlinx - The problem is that as you add more variables, there are too many unknowns for a neural network to have any level of accuracy, but a simulator may give a spread given certain pre-conditions. You have no way to know that the neural network is trained well until you test on future results, whereas if you are testing if it understanding speaking words you can train it, then test on other speakers.
James Black
It is standard neural network training to give it portions of the known dataset and see how it does on the other portions. That is exactly how one can 'test' whether a neural network is 'correctly trained'. For the purposes stated in the OP, the test is to watch it for a few months and see how it does.You basically argued against neural networks on the basis that they use incomplete knowledge and can only be tested by matching them to empirical results when this is *exactly* what the OP wants.
Clueless
@Clueless - I just think that there are better ways to have a chance to be more accurate, as a NN would be guessing, basically. But, if he found some variables that are better predictors then it may have a better chance of being closer to being correct.
James Black
"I don't believe a NN would be a good choice, since there is no way to test, after training, to see if the results are correct" - eh? Train it on 10 years of data to 2005 and then predict 2006-10. Add 2006 data and predict 2007-10, etc.
John
@John - There are too many unknowns as the market can change, and your program would be hard pressed to predict the avalanche effects due to sweeping regulation changes.
James Black
This is a school project. Testing it on a year it hadn't seen before is quite good enough.
John
+1  A: 
price = price * rand()/(float)RAND_MAX;

is also pretty good.

Martin Beckett
/me buys stock ;P
Alix Axel
fixed !- you can tell I worked on Wall St in the predictions dept!
Martin Beckett
/me sells short in your stock! *grin*
Chip Uni
A: 

Since your AI system has to 'learn' over a prolonged period, the only way you can do this is take into account as newer variables are introduced - each with a positive or negative bias. Some things will however, always be one-off constants, like you dumping all your stock for fun.

jvsingh
+1  A: 

Stock price changes might well be what AI people call "maximally perverse"; a function where just randomly trying things is as good a way to get results as anything.

As I understand it, there are two ways to make money in the stock market:

  1. Try and predict when a stock will go up bit and try and hold it during that period.
  2. Do lots of research (a.k.a. calling people on the phone and talking to them) to find companies that will produce a profit in the near (months) or not so near (years) future.

I think it would be a bad idea to try and write a program to do either of these because;

  1. For the first option (pure speculation), you are in effect trying to get money while making NO contribution to society or the economy. (If you just put your money in the a bag under your pillow, the world would be no worse off.) I believe that this makes option 1 unethical.
  2. For the second option (research) it should be clear that even if this is possible, it would be EXTREMELY hard to do. And I'm not talking about super clever people hard, but rather it might be harder than building the Google search index and might rack up a power bill of thousands of dollars a day just to feed the data gathering hardware.
  3. Also, if you do try either of these, you will be competing (yes, it is a competition and the score cards are totaled in $ with the looses loosing money) with companies with Millions if not Billions to dump into it and the ability to find the best programmers in the world. If you stood a fart in a hurricane's chance of making a cent by yourself, the people who you will be competing with would already be offering you a job.
BCS
After reading your update: The above is now for people who find this and ARE trying to make money. OTOH, I still think your sunk because the bulk of the information that you need is one off kinds of events so looking at what caused old data will be of little value to predict future events.
BCS
Exactly right. One root cause for the mortgage debacle in the US was using mortgage data from 20-50 years ago to predict future default rates. There were two problems: (1) This assumed that defaults weren't correlated in any way, and (2) It ignored the fact that the rules had changed substantially from the way it was done in the past, so the old data didn't apply. Extrapolation is a dangerous thing.
duffymo
"Stock price changes might well be what AI people call "maximally perverse"; a function where just randomly trying things is as good a way to get results as anything." -- ummmm... citation needed.
Juliet
This idea of predicting a "maximally perverse" function is somewhat akin to Rock-Paper-Scissors in that extreme short-term trading (like your option 1.) is not easily related to anything except the behaviors of your competitors. In Rock-Paper-Scissors there is no 'systematic' way to get better than guessing randomly and yet http://games.cs.ualberta.ca/~darse/rsbpc.html exists.
Clueless
+3  A: 

If you are just doing this as an experiment and don't plan on actually buying stocks, I would breed (GA) decision/equation trees or neural nets taking into account several stocks or categories of stocks. Things I would look for are where changes in trends of one stock tend to effect others stocks.

(p.s. This will only be useful for short range speculation and for that case, it should be pointed out that actual buying the stock in question could have a large enough effect to kill any profit to be had from the market. I have no idea how purchase orders would effect the market and the only way to find out would be to buy and see.)

BCS
+4  A: 

Stock price prediction has more to do with Behavioral Psychology than statistical analysis of the historical data. If you have an appropriate random sample of stock holders and ask them to rank the top 10 stock in their portfolio constantly (in a Twitter like manner) the relative ranking of each stock will give you the best possible basis for prediction.

0 and 1 polarity for news only tells if the news is good or bad not better or worse (or best or worst). That is why ranking is so important. There is an ancient Persian story which goes like this: Two philosophers were having a discourse about the definition of "intelligence". One suggested the "ability to distinguish between good and bad". The other pointed out that even animals distinguish between those who feed them and those who beat them and suggested that "intelligence is the ability to distinguish between 2 good and 2 bad in order to choose the better of 2 good and the lesser of 2 bad".

Even if we assume stock prices are the outcome of pure rational thought and free of emotions. It is the real-time comparative price in a portfolio that really counts.

I honestly think this problem does not compute.

Square Rig Master
I just loved it, +1.
Alix Axel
+10  A: 

It's not possible to tell you which machine learning technique will give the best performance because all of these techniques are very sensitive to the actual time series you're trying to train and how you train them. So for a school project I would instead recommend that you implement multiple techniques and compare their performance. (You should also implement more than one version of each technique, for example different numbers of layers and nodes in your neural models.) This will make a much better project, because you will be thinking and writing about the relative strengths of the models rather than just messing around with one of them.

In addition to predicting performance on both the trained data and later untrained data, what can you say about their predictive ability (within the historical data) when compared to the amount of information they are storing (i.e. the number of variables in the model)? Also, how do they do on more predictable sequences like sine waves and polynomials, or definitively unpredictable ones like random walks?

And can you implement a system where you don't learn at all, you just search for your test sequence (or some function of it) within the original historic data set? How does that algorithm perform?

As an alternative to machine learning techniques, you might also consider something with more of an expert-system flavor. For example, you could implement one of the technical analysis strategies (candlestick charting say, or Fibonacci bands) and see how it performs.

Finally, for a more realistic project, you could apply any of these techniques to systems where it really is possible to make money -- for example model (or locate) two different time series that contain arbitrage opportunities, and get your program to recognize those.

BTW, The Misbehavior of Markets is another interesting book about why price prediction is hard.

EDIT: So to actually answer your question (!) no, NN w/ back prop is not necessarily the best you can hope for. But it will probably do fine.

EDIT: If your project is really more about getting the data than analyzing it, is there something else you could do instead of prediction? For example, you could detect when trading halts should come into play, or when a short sale rule applies, or something like that?

Eric
+1  A: 

What you're tyring to do is called technical analysis. Bascially trying to predict the future price of a stock regardless of a company's environment, but focusing only on "the numbers." Taken to the extreme, technical analysis only considers movement of the stocks. If this is what you are trying to do (predict a stock's price based on past prices and such) the perhaps something closer to trend analysis is what you are looking for.

If you do a search for technical analysis you'll find a wealth of information on it, as well as some tutorials.

TskTsk
+2  A: 

Hi, I thought I would elaborate more on my feedback on your question.

  1. It doesn't really matter much what learning algorithm you use. You can use decision trees, neural nets, whatever. Doesn't matter much what algorithm it uses, so use something simple and basic.

  2. What IS important are the features. You probably need some deductive part to basically construct some good features for input. This is because all learning algorithms are kind of dumb, and aren't going to figure out complex relationships between data points. So, you have to think of what will be relevant, and then construct derived features for it.

  3. An example: suppose PE ration is relevant. You can construct this feature easily, from 2 basic features price and earnings, by calculating PE = P/E.

  4. Suppose that you realize that the uptrend of a stock is significant. You might calculate the last 4 days of prices, and if they are all monotonically ascending, forming roughly a line, with a slope of x, then you might create a feature with value x as an input. if they fail the above criteria, you can just set x to 0.

Anyways, if you have good features, pretty much any of the learning algorithms will learn well. If not, none of them will. So, that is where you should invest your time.

Larry Watanabe
+26  A: 
Juliet
"The race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favor to men of skill; but time and chance happeneth to them all" - The element of chance must be taken into account. while your argument is mathematically sound it has an unspoken proviso; namely "all things being equal" but all things are never equal! For example how does the newspaper headlines "CEO charged with sexual assult" fit in the equations above?
Square Rig Master
@Square: an investor who was a professional tennis coach explained the problem above like this: the best tennis students are those with regular form, stick with the fundamentals, and acknowledge the sports "best practices". While there are rare times when a wild or improvised shot might get a point, you'll lose the game playing like that all the time. The trick is playing the game using a winning strategy that you can use 95% of the time rather than 5% of the time. Likewise, there are winning strategies in investing, where the mathematical approach works 95% of the time, and gossip blogs 5%.
Juliet
In other words, you can't model for it, nor should you need to since events like that are rather extraordinary and will rarely have an impact on your long-term success as an investor. One day I might find $100 in the parking lot or I might get mugged and lose my purse -- it could happen, but I shouldn't balance my budget around these occurrences. The best strategy is to budgeting and investing is to plan for the normal occurrences, and have a safety net (rainy day fund or diverse portfolio) on the side.
Juliet
While stock fluctuation might not be completely random I would argue that to take advantage of them you need to be better than the next (Very well funded) guy. --- As to trying to predict things mathematically; In doing that, what benefit do you generate for the the world at large in exchange for the money you get? Put another way, how would the rest of the world be worse off by you not making those trades? Even if you *can* make money at it, it ends up being s something for nothing exchange (OTOH the ethics of it are a topic for a different forum).
BCS
The assumption behind the "industry standard" Black-Scholes model is that future prices will be a random variation from the current price -- and that the current price is therefore the best single prediction for future prices.
RickNZ
In addition, stoachastics, Bollinger bands, etc, look at past prices only and do not provide predictions of future prices. You can of course provide an algorithm on top of those that attempts to predict future prices, but the result tends to be statistically poor.
RickNZ
A problem with dividends, is that typically the stock falls in value just as the dividend is due, so you can't simply sell the moment you get your dividend. At least that's what I am told.
John
Joel Hoff
A: 

Let me know when you find the perfect system. Cannot wait to make a fortune and retire!

fastcodejava
A: 

You should be using weka http://www.cs.waikato.ac.nz/ml/weka/ its a machine learning library. It has various neural network implementations. It also has support vector machines which in theory should outperform your neural networks.

It's very simple to use. Written in Java. You can build classifiers up to train the data, save the model and run it against your test data.

steve
+2  A: 

Have you seen http://predictwallstreet.com/ ? It leverages a weighted average of the collective input of netizens to make predictions.

swidnikk
+2  A: 

I think you should use existing and well-understood technical and fundamental indicators (rsi, adx, cci, moving averages, etc), and then optimize and combine them into a final trading system by using genetic programming or gene expression programming. This generates for you a complete computer program, like: if(rsi crossed 20% and ema(15) > ema(50)) then buy at market price. Looking at this program gives you good understanding what works and what doesn't, and why.

iirekm