tags:

views:

404

answers:

1

In the "API usage example" on "Getting started" page in HBase documentation there is an example of scanner usage:

Scanner scanner = table.getScanner(new String[]{"myColumnFamily:columnQualifier1"});

RowResult rowResult = scanner.next();
 while (rowResult != null) {
  //...
  rowResult = scanner.next();

}

As I understand, this code will be executed on one machine (name node) and all scanning and filtering work will be not distributed. Only data storing and data loading will be distributed. How can I use distributed scanner, which will work separetly on each node.

Which is the best practise of fast data filtering? Thanks.

+1  A: 

This is old, anyway: the scanner is just a cursor-like api for retrieval of computed results. For computation, you use MapReduce jobs (hbase.mapred).

Tobu