views:

1957

answers:

12

I'm trying to make a stock market simulator (perhaps eventually growing into a predicting AI), but I'm having trouble finding data to use. I'm looking for a (hopefully free) source of historical stock market data.

Ideally, it would be a very fine-grained (second or minute interval) data set with price and volume of every symbol on NASDAQ and NYSE (and perhaps others if I get adventurous). Does anyone know of a source for such info?

I found this question which indicates Yahoo offers historical data in CSV format, but I've been unable to find out how to get it in a cursory examination of the site linked.

I also don't like the idea of downloading the data piecemeal in CSV files... I imagine Yahoo would get upset and shut me off after the first few thousand requests.

I also discovered another question that made me think I'd hit the jackpot, but unfortunately that OpenTick site seems to have closed its doors... too bad, since I think they were exactly what I wanted.

I'd also be able to use data that's just open/close price and volume of every symbol every day, but I'd prefer all the data if I can get it. Any other suggestions?

+1  A: 

Unfortunately historical ticker data that is free is hard to come by. Now that opentick is dead, I dont know of any other provider.

In a previous lifetime I worked for a hedgefund that had an automated trading system, and we used historical data profusely.

We used TickData for our source. Their prices were reasonable, and the data had sub second resolution.

Alan
+2  A: 

I'd crawl finance.google.com (for the quotes) - or finance.yahoo.com.

Both these will return html pages for most exchanges around the world, including historical. Then, it's just a matter of parsing the HTML to extract what you need.

I've done this in the past, with great success. Alternatively, if you don't mind using Perl - there are several modules on CPAN that have done this work for you - i.e. extracting quotes from Google/Yahoo.

For more, see Quote History

Chaos
+7  A: 

A data set of every symbol on the NASDAQ and NYSE on a second or minute interval is going to be massive.

Let's say there are a total of 4000 companies listed on both exchanges (this is probably on the very low side since there are over 3200 companies listed on the NASDAQ). For data at a second interval, assuming there are 6.5 trading hours in a day, that would give you 23400 data points per day per company, or about 93,600,000 data points in total for that one day. Assuming 200 trading days in a year, thats about 18,720,000,000 data points for just one year.

Maybe you want to start with a smaller set first?

matt b
I was operating under the assumption that most of the companies would not be traded every second, so the number of data points would be significantly less. perhaps that's a bad assumption. still, I was predicting on the order of 10s of GB per year...
rmeador
One a couple months of stock data for like 10 symbols came on 3 DVDs. The data was compressed text as well.
Alan
@rmeador thats true, but also some stocks have way more daily volume than there are seconds in a day, meaning they trade more than once a second - and not all trades are guaranteed to be at the same price. So you'd have to decide if you're interested in the price at an interval, or at trade
matt b
+1  A: 

You can use yahoo to get daily data (a much more managable dataset) but you have to structure the urls. See this link. You are not making lots of little requests you are making a fewer large requests. Lot of free software uses this so they shouldn't shut you down.

EDIT: This guy does it, maybe you can have a look at the calls his software makes.

jimconstable
at first I thought that link looked promising, but I can't seem to find how to specify historical data... it looks like it's all real-time. Am I missing something?
rmeador
you are right. I have added another link of someone with software that does the historical stuff so I know it is possible. Maybe have a look at the calls his software makes.
jimconstable
+2  A: 

There shouldn't be a big problem downloading it from Yahoo. Here's a good site:

http://www.gummy-stuff.org/Yahoo-data.htm

rledley
A: 

Warning: if you are looking to do stock speculation, you can expect that your AI's actions will effect the stock prices so the only way to test the system is to put real cash on the line.

BCS
I really don't think that trading in 10s of shares (all I can afford) will make more than a few-cent difference on any reasonably large-volume stocks. anyways, AI trading is a far off second-stage to what I'm trying to accomplish presently, though I would like to get there eventually...
rmeador
AI trading bets on your AI being able to squeeze a few more cents out of the transaction than the next guy can so even a few pennies of effect could kill it. Personally I have never liked the idea of people trying to make money by just shuffling money around as it doesn't help society at large.
BCS
"shuffling money around" is a common misconception of the stock market. You're actually shuffling ownership of companies around, which gives investment capital to those companies (helping society). If the companies do well, that's how the owners (shareholders) make money (helping themselves).
rmeador
Person A has mony to invest, Good. Person B needs money invested, Good. Person C gets money for helping A invest in B, All good. But the kind of decisions needed to make money by doing any of those are well beyond any known AI and require things like talking to people. Using an AI to predict value fluctuations is pure speculation and is something totally different.
BCS
There is $3 trillion per day traded in Forex, trillions in bonds, and hundreds of billions in stocks. Is a few stocks by a retail guy going to change that? Having said that, you're right but for the wrong reasons: the only way to avoid data snooping bias is to trade in real time. The only way to avoid execution bias is to trade with real cash in real time.
Gravitas
@Gravitas: How much of that $Xe12 is spent on stock that see movement on the order $K/day?
BCS
@BCS Of course, high illiquid penny stocks will be affected by orders of a few thousands of dollars per day. The highly liquid stocks are not affected by the sort of cash that retail traders can throw at it.
Gravitas
@Gravitas: Even for much more general cases, I suspect that the percentage effect a purchase will have on a stock price is on the same order (1/10x to 10x) as the percentage of the money spent on the stock in a fairly small window. At a guess, if someone bought 1% of the stock sold in an 1hr window, I'd expect it to have effected the price by at least 0.1%. Given correct numbers, this would put a floor on how small a predicted changes can acted on without a good model of what those changes are. And checking that model requires spending cash.
BCS
+5  A: 

I know you wanted "free", but I'd seriously consider getting the data from csidata.com for about $300/year, if I were you.

It's what yahoo uses to supply their data.

It comes with a decent API, and the data is (as far as I can tell) very clean.

You get 10 years of history when you subscribe, and then nightly updates afterward.

They also take care of all sorts of nasty things like splits and dividends for you. If you haven't yet discovered the joy that is data-cleaning, you won't realize how much you need this, until the first time your ATS (Automated Trading System) thinks some stock is really really cheap, only because it split 2:1 and you didn't notice.

Eric H.
+1  A: 

A former project of mine was going to use freely downloadable data from EODData.

Shaggy Frog
+1  A: 

We have purchased 12 years of intraday data from Kibot.com and are pretty satisfied with the quality.

As for storage requirements: 12 years of 1-minute data for all USA equities (more than 8000 symbols) is about 100GB.

With tick-by-tick data situation is little different. If you record time and sales only, that would be about 30GB of data per month for all USA equities. If you want to store bid / ask changes together with transactions, you can expect about 150GB per month.

I hope this helps. Please let me know if there is anything else I can assist you with.

boe100
+2  A: 

Using Yahoo's CSV approach above you can also get historical data! You can reverse engineer the following example:

http://ichart.finance.yahoo.com/table.csv?s=YHOO&d=0&e=28&f=2010&g=d&a=3&b=12&c=1996&ignore=.csv

Essentially:

sn = TICKER
a = fromMonth-1
b = fromDay (two digits)
c = fromYear
d = toMonth-1
e = toDay (two digits)
f = toYear
g = d for day, m for month, y for yearly
eckesicle
+1  A: 

Intro:
From yahoo you can get EOD (end of day) historical prices, or real-time prices. The EOD prices are amazingly simple to download. See my blog for explanations on how to get the data and for C# code examples.

I'm in the process of writing a real-time data feed "engine" that downloads and stores the real-time prices in a database. The engine will initially be able to download historical prices from Yahoo and Interactive Brokers and it will be able to store the data in a database of your choice: MS SQL, MySQL, SQLite, etc. It's open source, but I'll post more information on my blog when I get closer to releasing it (within a couple of days).

Another option is eclipse trader... it allows you to record the historical data with granularity as low as 1 minute and stores the prices locally in a text file. It basically downloads the real-time data from Yahoo with a 15 minute delay. Since I wanted a more robust solution and I'm working on a big school project for which we need data, I decided to write my own data feed engine (which I mentioned above).

Sample Code:
Here is sample C# code that demonstrates how to download real-time data:

public void Start()
{
    string url = "http://finance.yahoo.com/d/quotes.csv?s=MSFT+GOOG&f=snl1d1t1ohgdr";
    //Get page showing the table with the chosen indices
    HttpWebRequest request = null;
    IDatabase database =
        DatabaseFactory.CreateDatabase(
        DatabaseFactory.DatabaseType.SQLite);

    //csv content
    try
    {
        while (true)
        {
            using (Stream file = File.Create("quotes.csv"))
            {
                request = (HttpWebRequest)WebRequest.CreateDefault(new Uri(url));
                request.Timeout = 30000;
                using (var response = (HttpWebResponse)request.GetResponse())
                using (Stream input = response.GetResponseStream())
                {
                    CopyStream(input, file);
                }
            }
            Console.WriteLine("------------------------------------------------");
            database.InsertData(Directory.GetCurrentDirectory() + "/quotes.csv");

            File.Delete("quotes.csv");
            Thread.Sleep(10000); // 10 seconds
        }
    }
    catch (Exception exc)
    {
        Console.WriteLine(exc.ToString());
        Console.ReadKey();
    }
}

Database:
On the database side I use an OleDb connection to the CSV file to populate a DataSet and then I update my actual database via the DataSet, it basically makes it possible to match all of the columns from the CSV file returned from Yahoo directly to your database (if your database does not support batch inserts of CSV data, like SQLite). Otherwise, inserting the data is a one-liner... just batch insert the CSV into your database.

You can read more about the formatting of the url here: http://www.gummy-stuff.org/Yahoo-data.htm

Lirik
+1  A: 

Take a look at the Mergent Historical Securities Data API - http://www.mergent.com/servius

Eugene Osovetsky