tags:

views:

401

answers:

4

I have 2 tables:

1) table Masterdates which contains all dates since Jan 1, 1900

2) table Stockdata which contains stock data in the form date, symbol, open, high, low, close, volume (primary key = date, symbol)

This is what I'm looking to retrieve (presented here in CSV format)

MDate,SDate,Symbol,Open,High,...

6/4/2001,6/4/2001,Foo,47,49,...

6/5/2001,null,null,null,null,...

6/6/2001,6/6/2001,Foo,54,56,...

where MDate is from Masterdates and SDate is from Stockdata. I need to have the output start with the first (earliest) extent date for the desired symbol (in this example, Foo, starting on 6/4/2001) in Stockdata, and then include all dates in Masterdates up to and including the last (latest) available date for the desired symbol in Stockdata, outputting nulls where there is no corresponding Stockdata record for a given Masterdate record in the range described.

Is there a way to do this in a single query, a series of queries, and/or by adding auxiliary tables, that will yield fast results? Or will I have to dump out supersets of what I want, and then construct the final output using my (non-SQL) programing language?

TIA

+6  A: 

Tested in SQLITE3, your DB implementation may differ

SELECT   m.date, 
         s.symbol, 
         s.open, 
         s.high, 
         s.low, 
         s.close, 
         s.volume
FROM     masterdate AS m LEFT OUTER JOIN 
         stockdata AS s ON m.date = s.date
AND      s.symbol = 'Foo'
WHERE    m.date >= (SELECT MIN(date) FROM stockdata WHERE symbol = 'Foo')
AND      m.date <= (SELECT MAX(date) FROM stockdata WHERE symbol = 'Foo')

If this does not execute quicky enough then you could pronably improve performance by setting variables for minimum and maximum value in one query and then using those in the main query. This would save you at least one index hit.

So (In SQL Server Syntax)

SET @symbol = 'Foo'

SELECT @mindate = MIN(date),
       @maxdate = MAX(date)
FROM   stockdata
WHERE  stockdata.symbol = @symbol


SELECT   m.date, 
         s.symbol, 
         s.open, 
         s.high, 
         s.low, 
         s.close, 
         s.volume
FROM     masterdate AS m LEFT OUTER JOIN 
         stockdata AS s ON m.date = s.date
AND      s.symbol = @symbol
WHERE    m.date BETWEEN @mindate AND @maxdate

You will also need to be sure that you have an index on masterdate.date and a composite index on stockdata(date, symbol).

Steve Weet
22 seconds on a 6GB db. Faster than I thought it would be, but not where I need it to be. I think if I break out the data, I can chip that down to something much better (currently I have daily, intraday, stock, futures, index, you name it, data in a single table).Thank you!
+1  A: 

If you use SQLServer, you can use a TSQL stored procedure to get your results back. It will have basically two lines:

1) To get the first date for data available 2) A query with an outer join

If you use Oracle, you can use a PL/SQL to write a similar (although a little bit more complicated) stored procedure to compute the answer you want.

Pablo Santa Cruz
+3  A: 

This is a classic left join:

SELECT * FROM masterdates
    LEFT JOIN stockdata ON masterdates.date = stockdata.date;

Obviously this should be refined to only return the required columns.

Josef
But this is going to give me everything in Masterdates, no?
A: 

I would suggest filling out the missing entries from the Stockdata table and using an inner join. Should be much faster.

Ahmad