hbase

How does Hive compare to HBase?

I'm interested in finding out how the recently-released (http://mirror.facebook.com/facebook/hive/hadoop-0.17/) Hive compares to HBase in terms of performance. The SQL-like interface used by Hive is very much preferable to the HBase API we have implemented. ...

Hbase / Hadoop Query Help

I'm working on a project with a friend that will utilize Hbase to store it's data. Are there any good query examples? I seem to be writing a ton of Java code to iterate through lists of RowResult's when, in SQL land, I could write a simple query. Am I missing something? Or is Hbase missing something? ...

Ruby On Rails/Merb as a frontend for a billions of record app

I am looking for a backend solution for an application written in Ruby on Rails or Merb to handle data with several billions of records. I have a feeling that I suppose to go with a distributed model and at the moment I looked at HBase with Hadoop Couchdb Problems with HBase solution as I see it -- ruby support is not very strong, a...

how to design Hbase schema ?

Hi all suppose that I have this RDBM table (Entity-attribute-value_model): col1: entityID col2: attributeName col3: value and I want to use HBASe sue to scaling issues. I know that the only way to access Hbase table is using a primary key (cursor). you can get a cursor for a specific key, and iterate the rows one-by-one . The issue...

any feedback / comment for the pigi project ?

the pigi project - is a framework to create different indexes on top of HBase (apach's Big Table implementation) . In my usecase I need to query the data by different attributes, so it looks like it is going to fit my needs. Have you guys ever tried it ? What do you think of it ? when I googled pigi and habse I got ~ 12 results which l...

Hadoop Hbase: Spreading column families across tables or not

The Hbase documentation makes it clear that you should group similar columns into column families, because the physical storage is done by column family. But what does it mean to put two column families into the same table, as opposed to having separate tables per column group? Are there specific cases when "partitioning" tables this w...

Write php array in HBase using thrift

I have a Thrift php client and I want to write in a HBase table and I'm doing the following: $mutations = array( new Mutation( array( 'column' => 'entry:num', 'value' => array('a','b','c') ) ), ); $client->mutateRow( $t, $row, $mutations ); The problem is that when inserting in HBase the value, which is an ar...

Is HBase stable and production-ready?

For folks who have deployed HBase on their own clusters, do you feel that it's sufficiently stable for production use? What types of troubles or issues have you run into? I do see a bunch of companies listed as using HBase in production (http://wiki.apache.org/hadoop/Hbase/PoweredBy), but I'm curious as to whether a lot of maintenance,...

help needed with hbase schema design

Hi, I am trying to design a high scale key value storage system. The hbase schema for the same is outlined below: { "userid1" : { "update" : { t3 : "some update1", t2 : "some update2", t1 : "some update3" }, "sender" : { t3 : "sender3" t2 : "sender2" t1 : "sender1" }, "...

HBase distributed scanner

In the "API usage example" on "Getting started" page in HBase documentation there is an example of scanner usage: Scanner scanner = table.getScanner(new String[]{"myColumnFamily:columnQualifier1"}); RowResult rowResult = scanner.next(); while (rowResult != null) { //... rowResult = scanner.next(); } As I understand, t...

secondary index on column store dbs

Hi, Is there any column store database that supports secondary index ? I know HBase does, but it's not there yet. Haggai. ...

Writing an ActiveRecord adapter

I'd like to write my own ActiveRecord adapter for the HBase database since none currently exist. However, I've been searching for a while online and can't find any good resources on how to write an ActiveRecord adapter. How would you go about doing this, or are there any links you can recommend? ...

Thrift C# getRows

I'm having trouble implementing the Thrift API in my c# program. The lib's are built and it seems to run like it should, but one function is giving me trouble. As I understand it, getRows() is supposed to return a list of TRowResult, however it's only returning the first row in my table. My foreach loop only runs once. Anyone have experi...

Feed aggregator using hbase. How to design the schema?

I am working on a project involving monitoring a large number of rss/atom feeds. I want to use hbase for data storage and I have some problems designing the schema. For the first iteration I want to be able to generate an aggregated feed (last 100 posts from all feeds in reverse chronological order). Currently I am using two tables: F...

Can OLAP be done in BigTable?

In the past I used to build WebAnalytics using OLAP cubes running on MySQL. Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip,...

Advanced queries in HBase

Given the following HBase schema scenario (from the official FAQ)... How would you design an Hbase table for many-to-many association between two entities, for example Student and Course? I would define two tables: Student: student id student data (name, address, ...) courses (use course ids as column qualifiers h...

Which is the most suitable Key-Value Store for a RDBMS background person?

Is there a distinct winner among all the key-value stores? Cassandra, MongoDB, CouchDB? and do they all follow some central guidelines? or they all have their own say in defining their APIs. I'm asking this question, especially from a perspective of a RDBMS skilled person who is new to key-value stores. Which one should we follow to bes...

storing massive ordered time series data in bigtable derivatives

I am trying to figure out exactly what these new fangled data stores such as bigtable, hbase and cassandra really are. I work with massive amounts of stock market data, billions of rows of price/quote data that can add up to 100s of gigabytes every day (although these text files often compress by at least an order of magnitude). This d...

Is HBase meaningful if it's not running in a distributed environment?

I'm building an index of data, which will entail storing lots of triplets in the form (document, term, weight). I will be storing up to a few million such rows. Currently I'm doing this in MySQL as a simple table. I'm storing the document and term identifiers as string values than foreign keys to other tables. I'm re-writing the software...

Is this a suitable (or possible) use of HBase?

I want to use HBase as a store where I can push in a few million entries of the format {document => {term => weight}} e.g. "Insert term X into document Y with weight Z" and then issue a command like "Select the top 1000 terms for this document" or "Select the top 1000 term for each document". This works in my current MySQL implementation...