Looking at the combination of MapReduce and HBase from a data-flow perspective, my problem seems to fit. I have a large set of documents which I want to Map, Combine and Reduce. My previous SQL implementation was to split the task into batch operations, cumulatively storing what would be the result of the Map into table and then performi...
I'm building a Hadoop (0.20.1) mapreduce job that uses HBase (0.20.1) as both the data source and data sink. I would like to write the job in Python which has required me to use hadoop-0.20.1-streaming.jar to stream data to and from my Python scripts. This works fine if the data source/sink are HDFS files.
Does Hadoop support streaming...
I'm looking for a good and tested HBase tutorial, where I can find one?
...
Hi,
I'm currently designing an architecture for a web-based application that should also provide some kind of image storage. Users will be able to upload photos as one of the key feature of the service. Also viewing these images will be one of the primary usages (via web).
However, I'm not sure how to realize such a scalable image sto...
I have an application that requires analytics for different level of aggregation, and that's the OLAP workload. I want to update my database pretty frequently as well.
e.g., here is what my update looks like (schema looks like: time, dest, source ip, browser -> visits)
(15:00-1-2-2010, www.stackoverflow.com, 128.19.1.1, safari) --> 10...
Hello!
I'm playing around with an install of HBase cluster, and am trying to access the data via the Stargate REST interface. Most of the read-only functions (i.e. listing tables, getting version, meta data, etc) are work nicely. However, I'm having trouble with actually inserting data into any tables I've created. Here's what I've g...
import java.io.IOException;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hba...
Is the following architecture possible in Hadoop MapReduce?
A distributed key-value store is used (HBase). So along with values, there would be a timestamp associated with the values. Map & Reduce tasks are executed iteratively. Map, in each iteration should take in values which were added in the previous iteration to the store (perhaps...
More a general brainstorming question: what is the state-of-the-art in designing a (relational) database which scales to very large amounts of data? And given today's technology trends, how do we expect to design them in 5-10 years?
By scalabiliy, I mean in particualar the ability to increase capacity with linear cost by adding hardware...
How can I use a Hbase database with C#/VB.NET ?
(use=connect, query, get the result, insert, update, delete)
I don't find useful answers with google.
...
Hi there I'm use to SQL, but I need to read data from a HBase table. Any help on this would be great. A book or maybe just some sample code to read from the table. Someone said using a scanner would do the trick, but I do not know how to use it.
...
Can someone give a head-to-head comparison between them?
We are looking for a suitable storage engine for our weblog history data. We looked at Bigtable's paper and understand it is suitable to us well.
However, I also understand that Document-oriented-DB such as MongoDB seems to provide a little more powerful schema power -- i.e, it c...
I want to write a map/reduce job to select a number of random samples from a large dataset based on a row level condition. I want to minimize the number of intermediate keys.
Pseudocode:
for each row
if row matches condition
put the row.id in the bucket if the bucket is not already large enough
Have you done something like th...
I have 1M words in my dictionary. Whenever a user issue a query on my website, I will see if the query contains the words in my dictionary and increment the counter corresponding to them individually. Here is the example, say if a user type in "Obama is a president" and "Obama" and "president" are in my dictionary, then I should incremen...
Say I have "user". It's the key. And I need to keep "user count".
I am planning to have record with key "user" and value "0" to "9999+ ;-)" (as many as I'll have).
What problems I will drive in if I use Cassandra, HBase or MySQL for that?
Say, I have thousand of new updates to this "user" key, where I need to increment the value.
Am I i...
Hello all,
This question is regarding data storage systems such as CouchDB, HDFS and HBase, specifically, which is right.
I am looking at making a simple and customized Document Management System for my organization. Basically, we need the ability to store some Word Documents, PDFs and other similar files. I also want to store metada...
I have User model object with quite few fields (properties, if you wish) in it. Say "firstname", "lastname", "city" and "year-of-birth". Each user also gets "unique id".
I want to be able to search by them. How do I do that properly? How to do that at all?
My understanding (will work for pretty much any key-value storage -- first goes ...
I spent some time looking around, and all I could find is Jython. It's an option, but is there something that could be used in a more pythonesque (simpler) way?
...
Ho do I configure HBase so that the scanner only retrieves a number of records at a time? Or how do I improve the scanner when the database contains a lot of records/
...
Coming from a SQL Server background, I'm a newbie with regard to HBase, but the technology looks to be a good fit for what we're doing and the cost is definitely right!
I need to maintain a list of log entries which normally I would create in an RDBS as:
create table Log
(
UserID int, SiteID int, Page varchar(50), Date smalldatetim...