views:

42

answers:

3

Is there some reliable (i.e. acknowledged) data on the execution times for various queries, such as different types of joins, scans, with different types and size of data? I am looking for the order of magnitude. This does not have to be vendor-specific data.

It should be similar in terms of presentation for latency for various access times. (Click View Response.)

The reason I am looking for it is to decide whether or not I should use a RDBMS at all for a project I'm working on. I do not need complex joins, and I may get away with local memcache or even disk access.

As per request, the part of the project that may require a RMDBS is all local on a single node. The node may store any kind of data, and any kind of read/write is key-based, in that it treats its storage component as a huge dictionary. It may try to iterate through objects, and there could be a single object that provides a list to those objects. This kind of iteration may be something like "Find all objects that have this attribute" (i.e. find all rows for which a column matches this value). This is the part I am unsure about - how much advantage would an RDBMS offer me, if at all? I do not foresee the need for joins.

Another reason I'd like this benchmark is so that I could decide how much to offload to individual nodes, given the difference between network latency and local execution times.

+1  A: 

The "No SQL" movement might make a decision to seek an alternative to relational databases a good idea.

But under no circumstances would I try to implement a persistence technology on my own.

Your question is impossible to answer in a generic way.

The best way to tackle it that I can think of is to write a generic persistence interface that allows you to swap implementations in and out. Write a relational implementation and another technology of your choosing. Benchmark the two and see which one works under realistic conditions for your situation. Be a scientist - get some data. Creating it yourself, rather than depending on a forum, seems like a smart approach to me.

duffymo
A: 

From your description it seems you do not need a RDMS.

Do you have a use for SQL language access to the data.... or are your uses fairly predictable.

An RDBMS has many layers of overhead such as parsing the SQL statements.

If you can just store the data in memory and on a filesystem without requiring SQL language access you have the opportunity to make the system faster.

Hope that helps. Ivan

Ivan Novick
A: 

The question about the time for each type of query is hard to answer directly, but I will try. I personally also likes to know absolute times of different operations. I think for the simple by key access you can expect 10-20 milliseconds per simple SQL. Another number - if you want to read a lot of data from one table, you can expect getting hundred of megabytes per second on the server with one modern CPU. Regarding the RDBMS server selection - for the simple queries, simple RDBMS engines like MySQL will do the best. If you queries will became complex - then more serious engines like Oracle, DB2, SQLServer will pay for themselves. Some more benchmark I have done for the google app engine datastore: dbaspects.blogspot.com

David Gruzman