questions about hadoop

Experience with Hadoop?

Hi, Have any of you tried Hadoop? Can it be used without the distributed filesystem that goes with it, in a Share-nothing architecture? Would that make sense? I'm also interested into any performance results you have... ...

performance

distributed

hadoop

shared-nothing

How does Hive compare to HBase?

I'm interested in finding out how the recently-released (http://mirror.facebook.com/facebook/hive/hadoop-0.17/) Hive compares to HBase in terms of performance. The SQL-like interface used by Hive is very much preferable to the HBase API we have implemented. ...

hadoop

hbase

hive

Hbase / Hadoop Query Help

I'm working on a project with a friend that will utilize Hbase to store it's data. Are there any good query examples? I seem to be writing a ton of Java code to iterate through lists of RowResult's when, in SQL land, I could write a simple query. Am I missing something? Or is Hbase missing something? ...

hadoop

hbase

Is it possible to perform arbitrary data analysis in Erlang?

I want to answer questions about data in Erlang: count things, correlate messages, provide arbitrary statistics. I had thought about resorting to Hadoop for this but is it possible to build a solution in raw Erlang to do rather arbitrary data analysis not necessarily via map/reduce but somehow? I have seen some hints of people doing t...

data

erlang

hadoop

Ruby On Rails/Merb as a frontend for a billions of record app

I am looking for a backend solution for an application written in Ruby on Rails or Merb to handle data with several billions of records. I have a feeling that I suppose to go with a distributed model and at the moment I looked at HBase with Hadoop Couchdb Problems with HBase solution as I see it -- ruby support is not very strong, a...

How do you use MapReduce/Hadoop?

I'm looking for some general information about how other people are using Hadoop or other MapReduce-like technologies. In general, I am curious to whether you are writing MR applications to process existing data sets (like web server log files), or are you writing applications that generate and process new data sets? Edit: Follow-up Que...

hadoop

mapreduce

Is there a .Net equivalent to Apache Hadoop?

So, I've been looking at Hadoop with keen interest, and to be honest I'm fascinated, things don't get much cooler. My only minor issue is I'm a C# developer and it's in Java. It's not that I don't understand the Java as much as I'm looking for the Hadoop.net or NHadoop or the .net project that embraces the Google MapReduce approach. Do...

Large data - storage and query

We have a huge data of about 300 million records, which will get updated every 3-6 months.We need to query this data(continously, real time) to get some information.What are the options - a RDBMS(mysql) , or some other option like Hadoop.Which will be better? ...

how to design Hbase schema ?

Hi all suppose that I have this RDBM table (Entity-attribute-value_model): col1: entityID col2: attributeName col3: value and I want to use HBASe sue to scaling issues. I know that the only way to access Hbase table is using a primary key (cursor). you can get a cursor for a specific key, and iterate the rows one-by-one . The issue...

any feedback / comment for the pigi project ?

the pigi project - is a framework to create different indexes on top of HBase (apach's Big Table implementation) . In my usecase I need to query the data by different attributes, so it looks like it is going to fit my needs. Have you guys ever tried it ? What do you think of it ? when I googled pigi and habse I got ~ 12 results which l...

hadoop

hbase

how to implement eigenvalue calculation with MapReduce/Hadoop?

It is possible because PageRank was a form of eigenvalue and that is why MapReduce introduced. But there seems problems in actual implementation, such as every slave computer have to maintain a copy of the matrix? ...

Hadoop on windows server

Hello, I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM) The questions are: Is there any good tutorial on how to configure an hadoop cluster on windows? What are the requirements? java + cygwin + sshd ? Anything else? HDFS, does it play nice ...

Amazon S3 architecture

While the post @ http://highscalability.com/amazon-architecture explains Amazon's architecture in general, I am interested in knowing how Amazon S3 is implemented. Some of my guesses are A distributed file system like HDFS http://hadoop.apache.org/core/docs/current/hdfs_design.html A non relational persistent DB like CouchDB http://co...

Hadoop Hbase: Spreading column families across tables or not

The Hbase documentation makes it clear that you should group similar columns into column families, because the physical storage is done by column family. But what does it mean to put two column families into the same table, as opposed to having separate tables per column group? Are there specific cases when "partitioning" tables this w...

database-design

hadoop

hbase

MapReduce on AWS

Anybody played around with MapReduce on AWS yet? Any thoughts? How's the implementation? ...

amazon-web-services

hadoop

Hadoop examples?

I'm examining Hadoop as a possible tool with which to do some log analysis. I want to analyze several kinds of statistics in one run. Each line of my log files has all sorts of potentially useful that I'd like to aggregate. I'd like to get all sorts of data out of the logs in a single Hadoop run, but the example Hadoop programs I see ...

hadoop

Framework for running distributed computations in .NET cloud

I'm thinking about developing a framework to simplify running distributed computations in .NET cloud environment of Windows Azure. Azure currently (and by the time of the release, most likely) is completely unsuited for simple running of distributed queries in the cloud (details). Simple for me is something like DryadLINQ where you can...

java.io.IOException: Job failed! when running a sample app on my osx with hadoop-0.19.1

bash-3.2$ echo $JAVA_HOME /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home bash-3.2$ bin/hadoop dfs -copyFromLocal conf /user/yokkom/input2 bash-3.2$ bin/hadoop jar hadoop-*-examples.jar grep input2 output 'dfs[a-z.]+' 09/04/17 10:09:32 INFO mapred.FileInputFormat: Total input paths to process : 10 09/04/17 10:09:33 INFO ma...

What is the use of the 'key K1' in the org.apache.hadoop.mapred.Mapper ?

I'm learning Apache Hadoop and I was looking at the WordCount example org.apache.hadoop.examples.WordCount. I've understand this example, however I can see that the variable LongWritable key was not used in (...) public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, ...

hadoop

mapreduce

Hadoop: map/reduce from HDFS

I may be wrong, but all(?) examples I've seen with Apache Hadoop takes as input a file stored on the local file system (e.g. org.apache.hadoop.examples.Grep) Is there a way to load and save the data on the Hadoop file system (HDFS)? For example I put a tab delimited file named 'stored.xls' on HDFS using hadoop-0.19.1/bin/hadoop dfs -put...