hbase

Caching of Map applications in Hadoop MapReduce?

Looking at the combination of MapReduce and HBase from a data-flow perspective, my problem seems to fit. I have a large set of documents which I want to Map, Combine and Reduce. My previous SQL implementation was to split the task into batch operations, cumulatively storing what would be the result of the Map into table and then performi...

Hadoop mapreduce streaming from HBase

I'm building a Hadoop (0.20.1) mapreduce job that uses HBase (0.20.1) as both the data source and data sink. I would like to write the job in Python which has required me to use hadoop-0.20.1-streaming.jar to stream data to and from my Python scripts. This works fine if the data source/sink are HDFS files. Does Hadoop support streaming...

Looking for a good HBase tutorial

I'm looking for a good and tested HBase tutorial, where I can find one? ...

Scalable Image Storage

Hi, I'm currently designing an architecture for a web-based application that should also provide some kind of image storage. Users will be able to upload photos as one of the key feature of the service. Also viewing these images will be one of the primary usages (via web). However, I'm not sure how to realize such a scalable image sto...

Any scalable OLAP database (web app scale)?

I have an application that requires analytics for different level of aggregation, and that's the OLAP workload. I want to update my database pretty frequently as well. e.g., here is what my update looks like (schema looks like: time, dest, source ip, browser -> visits) (15:00-1-2-2010, www.stackoverflow.com, 128.19.1.1, safari) --> 10...

How to insert data into Hbase tables using PHP Stargate client

Hello! I'm playing around with an install of HBase cluster, and am trying to access the data via the Stargate REST interface. Most of the read-only functions (i.e. listing tables, getting version, meta data, etc) are work nicely. However, I'm having trouble with actually inserting data into any tables I've created. Here's what I've g...

i got this exception while i run hbase client...

import java.io.IOException; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hba...

is this architecture possible in Hadoop MR?

Is the following architecture possible in Hadoop MapReduce? A distributed key-value store is used (HBase). So along with values, there would be a timestamp associated with the values. Map & Reduce tasks are executed iteratively. Map, in each iteration should take in values which were added in the previous iteration to the store (perhaps...

How build a scalable (relational) database for Petabytes+ of data?

More a general brainstorming question: what is the state-of-the-art in designing a (relational) database which scales to very large amounts of data? And given today's technology trends, how do we expect to design them in 5-10 years? By scalabiliy, I mean in particualar the ability to increase capacity with linear cost by adding hardware...

Using Hbase with C#

How can I use a Hbase database with C#/VB.NET ? (use=connect, query, get the result, insert, update, delete) I don't find useful answers with google. ...

How to read data from Hbase?

Hi there I'm use to SQL, but I need to read data from a HBase table. Any help on this would be great. A book or maybe just some sample code to read from the table. Someone said using a scanner would do the trick, but I do not know how to use it. ...

Difference between Document-oriented-DB and Bigtable clones

Can someone give a head-to-head comparison between them? We are looking for a suitable storage engine for our weblog history data. We looked at Bigtable's paper and understand it is suitable to us well. However, I also understand that Document-oriented-DB such as MongoDB seems to provide a little more powerful schema power -- i.e, it c...

How to pick random (small) data samples using Map/Reduce?

I want to write a map/reduce job to select a number of random samples from a large dataset based on a row level condition. I want to minimize the number of intermediate keys. Pseudocode: for each row if row matches condition put the row.id in the bucket if the bucket is not already large enough Have you done something like th...

Hbase schema design -- to make sorting easy?

I have 1M words in my dictionary. Whenever a user issue a query on my website, I will see if the query contains the words in my dictionary and increment the counter corresponding to them individually. Here is the example, say if a user type in "Obama is a president" and "Obama" and "president" are in my dictionary, then I should incremen...

Cassandra/HBase or just MySQL: Potential problems doing the next thing

Say I have "user". It's the key. And I need to keep "user count". I am planning to have record with key "user" and value "0" to "9999+ ;-)" (as many as I'll have). What problems I will drive in if I use Cassandra, HBase or MySQL for that? Say, I have thousand of new updates to this "user" key, where I need to increment the value. Am I i...

CouchDB, HDFS, HBase or which is right for my situation?

Hello all, This question is regarding data storage systems such as CouchDB, HDFS and HBase, specifically, which is right. I am looking at making a simple and customized Document Management System for my organization. Basically, we need the ability to store some Word Documents, PDFs and other similar files. I also want to store metada...

Searches (and general querying) with HBase and/or Cassandra (best practices?)

I have User model object with quite few fields (properties, if you wish) in it. Say "firstname", "lastname", "city" and "year-of-birth". Each user also gets "unique id". I want to be able to search by them. How do I do that properly? How to do that at all? My understanding (will work for pretty much any key-value storage -- first goes ...

Is there a good library for accessing HBase from Python?

I spent some time looking around, and all I could find is Jython. It's an option, but is there something that could be used in a more pythonesque (simpler) way? ...

How to improve HBase Scanner??

Ho do I configure HBase so that the scanner only retrieves a number of records at a time? Or how do I improve the scanner when the database contains a lot of records/ ...

HBase schema help

Coming from a SQL Server background, I'm a newbie with regard to HBase, but the technology looks to be a good fit for what we're doing and the cost is definitely right! I need to maintain a list of log entries which normally I would create in an RDBS as: create table Log ( UserID int, SiteID int, Page varchar(50), Date smalldatetim...