tags:

views:

50

answers:

2

I'm a newbie with node.js and riak, trying to use riak-js. I wrote the following coffeescript, to create N entries with the squares of integers 1..N. The script works fine for N=10. If I put a console.log() callback in the db.get() I can print the squares of 1..10.

db = require('riak-js').getClient({debug:false})

N = 10

for i in [1..N]
 db.save('Square', String(i), String(i*i))

for i in [1..N]
 db.get('Square', String(i))

My problem is that when I put N=1000 it takes about 10 seconds for my script to complete. Is this normal? I was expecting something well under 1 sec. I have a single riak node on my local machine, an Acer Aspire 5740, i3 CPU and 4GB RAM, with Ubuntu 10.04. For a RAM-only store, I have set storage_backend in $RIAK/rel/riak/etc/app.config to riak_kv_ets_backend. The riak-admin status command confirms this setting.

Q1: Perhaps riak-js is setting some default disk-based backend for my bucket? How do I find out/override this?

Q2: I don't think it's a node.js issue, but am I doing something wrong in asynchronous usage?

A: 

A1: riak-js does not use any hidden setting, it is up to you to configure your Riak nodes.

A2: Your script seems fine, there's nothing you're doing wrong.

The truth is I haven't started benchmarking or seriously considering performance issues.

That said, every request is queued internally and issued serially. It makes the API simpler and you don't run into race conditions, but it has its limitations. Ideally I want to build a wrapper around riak-js that will take care of:

  • Holding several instances to make requests in parallel
  • Automatically reconnecting to other nodes in the cluster when one goes down

Your example runs in ~5sec on my MBP (using Bitcask).

 =>  time coffee test.coffee 

real    0m5.181s
user    0m1.245s
sys 0m0.369s

Just as a proof of concept, take a look at this:

dbs = [require('riak-js').getClient({debug: false}), require('riak-js').getClient({debug: false})]

N = 1000

for i in [1..N]
  db = dbs[i % 2]
  db.save('sq', String(i), String(i*i))

for i in [1..N]
  db = dbs[i % 2]
  db.get('sq', String(i))

Results:

 =>  time coffee test.coffee 

real    0m3.341s
user    0m1.133s
sys 0m0.319s

This will improve by using more clients hitting the DB.

Otherwise the answer is the Protocol Buffers interface, no doubt about it. I couldn't get it running with your example so I'll have to dig into it. But that should be lightning fast.

Make sure you're running the latest Riak (there have been many performance improvements). Also take into account a little overhead for CoffeeScript compilation.

frank06
Thanks frank06. I noticed that scalaris, for example, supports parallel requests via 'request lists'. I tried a weak version of this with riak-js by creating a long string: I put s = s + String(i) + "," + String(i*i) + ";" in a for loop, and then did a single save/get to the riak process. This crunched thru N=100000 in a flash.
A: 

Here is my test file:

db = require('../lib').getClient({debug:false})

N = if process.argv[2] then process.argv[2] else 10

for i in [1..N]
 db.save('Square', String(i), String(i*i))

for i in [1..N]
 db.get('Square', String(i))

After Compiling, I get the following times:

$ time node test1.js 1000

real 0m3.759s
user 0m0.823s
sys  0m0.421s

After running many iterations, my times were similar at that volume regardless of backend. I tested ets and dets. The os will cache your disk blocks on the first run at a particular volume but subsequent runs are faster.

Following up on frank06's answer, I would also look into connection handling. This is not an issue with Riak, so much as it is an issue in how riak-js sets up it's connections. Also note that in Riak, all nodes are the same so if you had a three node cluster you would create connections to all three nodes and round robin them in some fashion. Protobuf api is the way to go but requires some extra care in setting up.

siculars
Thanks siculars. I know a bit of erlang (I was a keen student until I stumbled upon node) so I'll take a look at the erlang client PBC.