tags:

views:

133

answers:

3

Hi,

I started to look into NoSql and was wondering what others think of the suitability of such solutions for storing and querying financial time series data?

For example, in a simple scenario, I would store the stock symbol, open, high, low, close, volume and a timestamp. I would then want to query for that data based on symbol and a timestamp range.

What do you think would be a good document structure for this scenario?

Thanks,

Tom

Edit: I'm mainly concerned about the read query performance of time series based data in a NoSQL solution vs a traditional RMDBS solution

+1  A: 

Take a look at ESENT.

For your scenario, I'd consider using primary index over 2 columns: either symbol + timestamp (if you're going to lookup individual symbols over some interval) or timestamp + symbol (if you're going to fetch all symbols over some interval).

Soonts
+1  A: 

Tom. What exactly are you trying to achieve? RavenDB can certainly handle this scenario, but you have to be aware of the fact that RavenDB's indexes are updated on the background. Your scenario seems to be suited for an RDBMS, so I have to ask why you are looking for a NoSQL solution.

Ayende Rahien
background updating of indexes is not an issue for this use case. My question is mainly about read performance. Will a NoSql solution fare better in a "time series" (time range) query than a traditional RMDBS solution?
Tom Frey
Probably, with RavenDB, you can probably do most of the work directly on top of the built index, which would be _very_ fast
Ayende Rahien
+2  A: 

Tom, financial data tends to have strict consistency and persistence requirements. At first glance and without further knowledge of your application I would expect you to need the ACID properties of an RDBMS as opposed to the BASE properties which usually define the NoSQL solutions. Maybe if you describe your usage pattern and why you think you require a Non-relational model, I will be able to find a more suitable solution for you.

As it stands, your data seems to be easily structured by the relational model and has a quite rigid schema so I don't see a need for a Schemaless db (MongoDB, CouchDB, Riak...). Usually stock quotes need to have strong consistency (always be up to date) so I don't see any point in a dynamo clone (Cassandra, Voldemort...). And unless you already have a tremendous amount of data and hit a wall in regards to processing speeds and resource usage I wouldn't go for a column based db (HBase, Hypertable)

Asaf
ACID properties are not a requirement for me here. The data that is stored is updated overnight only in a batch job and will receive read only queries throughout the day.What I'm curious about is if a NoSQL solution will fare better in a "time series" based query (selecting data within a time range) than traditional RMDBS solutions
Tom Frey
It doesn't sound like you have an availability requirement here, you just want fast queries on a read only database.That sounds like something pretty much any decent database can provide all you really need is an index on the timestamp.I don't think a NoSQL solution would fare better, but it depends on the scale. Honestly I would use a search engine such as Solr (or Lucene) and just tweak the caching since your data is read-only they can be very fast.
Asaf