views:

113

answers:

4

Basically it's a financial database, with both daily and intraday data (date,symbol,open,high,low,close,vol,openinterest) -- very simple structure. Updates are just once a day. A typical query would be: date and close price of MSFT for all dates in DB. I was thinking that there's got to be something out there that's been optimized for lots of reads and not many writes, as opposed to a general-purpose RDBMS like MySQL. I searched rubyforge.org, and I didn't see anything that specifically addressed this (as far as I could tell).

+1  A: 

Look at MySQL, but run the database from memory instead of disk. Depends on the size of your dataset and your budget, but you could then update memory from disk once a day, and have a very, very fast read time afterwards.

Dean J
+3  A: 

MS SQL Server can be optimized like this with the fairly simple:

ALTER DATABASE myDatabase
SET READ_COMMITTED_SNAPSHOT ON

SQL Server will automatically cache your data in memory if it is being used heavily for reads.

John Gietzen
oops, failed to mention: needs to be free.
SQL Server Express is free (http://www.microsoft.com/express/sql/default.aspx) if you can live with single-CPU operation and have less than a 4GB database...
Mike Woodhouse
+1  A: 

You can always use a RAMdisk for your MySQL installation if your database footprint is small enough. One way to make your tables small enough to fit is to create them as MyISAM ARCHIVE tables. While they are very compact, compressed, they can only be appended to or read from, but not updated. (http://dev.mysql.com/tech-resources/articles/storage-engine.html)

Generally a properly indexed and well organized MySQL table is really fast, especially when using MyISAM, and even more so when loaded from memory. They key is in denormalizing the data as heavily as you can optimizing for your particular read scenarios.

For example, having a stock_id, date, price tuple is going to be fairly slow to sort and retrieve. If you have, instead, stock_id and a column with some serialized data, the retrieval time will be very quick.

Another solution that is likely faster is to push all the data into an alternative DBMS like Toyko Cabinet or something similar, especially if your data fits neatly into a key/value store.

tadman
I've used denormalizing before, but not to that extent. Great idea, thanks.
When absolute performance is an issue, denormalize everything you possibly can, and then denormalize some more. Get rid of all possible joins and trim it to direct index hits only and queries will be extremely fast.
tadman
A: 

The best-known (to me at least!) time series database is Fame but it's expensive and I strongly doubt that there's anything like, say, an ActiveRecord implementation for it. Unless it's changed a lot in the 10 or so years since I last touched it, it isn't SQL-friendly at all.

With a fairly tightly-focused application, you can take a more flexible view of your data. For example, consider what is the information that you're actually looking to store? Is it the atomic price/hi/lo/close/vol/whatever, or is it more appropriately a time series of such values? If you always want to view the series, store a series per row, not a value.

Throwing a few ideas out here...

How might it look if you stored a year or a month of a single value for a single stock in one row? Maybe as an XML string, or JSON or something more terse of your own devising. Compressed CSV, perhaps? That ought to fit a month's values into a 255-character column. (Use something like Huffman coding to do the encoding, perhaps - a single dictionary ought to work for all instances of such similar data).

You can still hold a horizontal view as well: with the extremely low update rate you'll have (should only be data fixes, I'd guess) you can probably stand to build that stuff.

There's an obvious downside to this: you'll have a bunch of extra work to do.

I don't have any personal experience, but MogoDB claims to offer relational-style flexibility with key-value performance.

As mentioned elsewhere key-value database might be worth looking at: Tokyo Cabinet, CouchDB or one of the others again, perhaps, with concatenated value for the time series.

Mike Woodhouse
re: Fame -- kdb is also famous for speedy time-series access, but it too is not cheap. Thanks for the other suggestions.