views:

109

answers:

3

By what factor does the performance (read queries/sec) increase when a machine is added to a cluster of machines running either:

  • a Bigtable-like database
  • MySQL?

Google's research paper on Bigtable suggests that "near-linear" scaling is achieved can be achieved with Bigtable. This page here featuring MySQL's marketing jargon suggests that MySQL is capable of scaling linearly.

Where is the truth?

+2  A: 

If you don't do that many writes to the database, MySQL may be a good and easy solution, especially if coupled with memcached in order to increase the read speed.

OTOH if you data is constantly changing, you should probably look somewhere else:

These systems have been designed to scale linearly with the number of computers added to the system. A full list is available here.

the_void
Thanks, but I am not trying to find out which one to use, I need detailed statistics on how the systems scale under similar circumstances.
bjornl
Here are some interresting benchmarks:- http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf- http://blog.medallia.com/2010/05/choosing_a_keyvalue_storage_sy.html- http://voltdb.com/blog/key-value-benchmarkingThe first, from Yahoo, compares the scalability of HBase and Cassandra.There isn't a standard methodology for testing those systems; each of them has special characteristics that make it more suitable for specific tasks.
the_void
+1  A: 

Having built and benchmarked several applications using VoltDB I consistently measure between 90% and 95% of additional transactional throughput as each new server is added to the cluster. So if an application is performing 100,000 transaction per second (TPS) on a single server, I measure 190,000 TPS on 2 servers, 280,000 TPS on 3 servers, and so on. At some point we expect the server to server networking to become a bottleneck but our largest cluster (30 servers) is still above 90%.

tmcallaghan
A: 

We discuss exactly this topic and give specific numbers for Riak running on Joyent SmartMachines in the recording here:

http://blog.basho.com/2010/09/16/nosql-performance-in-the-cloud-webinar/

Justin Sheehy