Hadoop MR source: HDFS vs HBase. Benefits of each? | ansaurus

tags:

views:

51

answers:

1

Q:

Hadoop MR source: HDFS vs HBase. Benefits of each?

If I understand the Hadoop ecosystem correctly, I can run my MapReduce jobs sourcing data from either HDFS or HBase. Assuming the previous assumption is correct, why would I choose one over the other? Is there a benefit of performance, reliability, cost, or ease of use to using HBase as a MR source?

The best I've been able to find is this quote, "HBase is the Hadoop application to use when you require real-time read/write random-access to very large datasets." - Tom White (2009) Hadoop: The Definitive Guide, 1st Edition

+1 A:

Using straight-up Hadoop Map/Reduce over HDFS, your inputs and outputs are typically stored as flat text files or Hadoop SequenceFiles, which are simply serialized objects streamed to disk. These data stores are more or less immutable. This makes Hadoop suitable for batch processing tasks.

HBase is a full-fledged database (albeit not relational) which uses HDFS as storage. This means you can run interactive queries and updates on your dataset.

What's nice about HBase is that it plays nicely with the Hadoop ecosystem, so if you have the need to perform batch processing as well as interactive, granular, record-level operations on huge datasets, HBase will do both well.

bajafresh4life 2010-09-23 13:29:06

Thanks, what's what I was looking for.

Andre 2010-09-24 12:27:45

related questions

Have you ever implemented a programming language?

How does "Find Nearest Locations" work?

How are Python's Built In Dictionaries Implemented

Interfaces with static fields in java for sharing 'constants'

Problem Implementing XmlTextWriter in new XmlRecordsetWriter for Streams

How is the NodeList implemented?

NHibernate IInterceptor implementation(add properties to DB table that original domain class doesn't have)

Why to Use Explicit Interface Implementation To Invoke a Protected Method?

C#: How can Dictionary<K,V> implement ICollection<KeyValuePair<K,V>> without having Add(KeyValuePair<K,V>)?

uses for state machines

How does Base 64 handle binary data with zeroes at the end

Is there any good JavaScript hash(code/table) implementation out there?

implementing a compiler in "itself"

How would I implement a Python bit map?

Can Dns.GetHostEntry ever return an IPHostEntry with an empty AddressList?

Recursive overloading semantics in the Scala REPL - JVM languages

What's a good Common Lisp implementation for Windows?

How would one code test and set behavior without a special hardware instruction?

Singleton: How should it be used

Implements several interfaces with conflict in signatures

Which should I implement first, PayPal or Google Checkout, on my eCommerce website?

How to solve call ambiguity between Generic.IList<T>.this[] and IList.this[]?

How would you implement the IEnumerator interface?

How would you implement a hashtable in language x?

When should I use type abstraction in embedded systems