tags:

views:

247

answers:

2

Ho do I configure HBase so that the scanner only retrieves a number of records at a time? Or how do I improve the scanner when the database contains a lot of records/

+1  A: 

I believe the scanner only actually requests one item at a time unless you set the caching. You can check just to be sure with getCaching()

Each time you call ResultScanner#next(), it will retrieve the next item. You can also use ResultScanner#next(int) to retrieve a number of results at a time.

When setting up the scanner you can use Scan#setCaching to retrieve results in advance http://hadoop.apache.org/hbase/docs/r0.20.4/api/org/apache/hadoop/hbase/client/Scan.html#setCaching(int)

The chances are your scanner is slow because you are only reading one record at a time(that includes all of the back and forth of the RPC protocol and whatnot). So if you intend to read a lot, let the system cache a few results for you in advance.

juhanic
A: 

You may also want to examine the Filter API, which allows you to selectively return a subset of rows or cells to the client: http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/filter/package-summary.html.

Jeff Hammerbacher