views:

215

answers:

2

Hello -

I want to create a massive TimeSeries object which will hold 1000 different financial markets data series, each storing 1500 daily-data points. I'm quite new to the TimeSeries module and am a little confused as to how I would best go about it. So a few basic questions:

1) Should I use a huge numpy array of 1000x1500 and simply feed that to the time series constructor function time_series()?

2) If I do this how will I index each series by name (eg "S&P500" or "GOLD" for example)? I know I will be able to access the array by date, but will I have to have a separate data structure to link series names with their column numbers in the large array?

3) Or should I use a structured data type as per the example given in the docs(http://pytseries.sourceforge.net/core.timeseries.html)? If so, how do I append series one by one to the timeseries, since I don't want to create a massive non-numpy structure to feed to the time_series() constructor in one shot?

Advice on where I can get some good examples for financial markets and timeseries module in general would also be appreciated.

Thanks.

+1  A: 

1) i once implemented a pagerank algorithm for a small set (~10K) of linked documents, therefore in during the calculation a 10Kx10K matrix had to be handled, for which the numpy array implementation was - as i recall - blazingly fast.

2) imho storing metadata like series name externally does not hurt that much ..

3) i haven't worked with scikits.timeseries, but would definitely look into it; as far as i can see, the project lives around the same scipy orbit as numpy ..

The MYYN
A: 

For help on this, have a look at Quantlib which is a useful library for financial work, and which has an active users mailing list.

In addition, read this book review for a book entitled Financial Modeling in Python.

Michael Dillon
Thanks Michael - I know Quantlib. Excellent for pricing stuff, not so great for doing stats on large datasets. As for the book - thank you ** 10 because I have been looking for such a book for a while!
Thomas Browne
Do have a look at the quantlib-users mailing list and try asking your question there as well. Even though quantlib doesn't solve this particular technical problem, it is likely that the quantlib community of users have some experience wrestling with time-series data.
Michael Dillon