hive

How does Hive compare to HBase?

I'm interested in finding out how the recently-released (http://mirror.facebook.com/facebook/hive/hadoop-0.17/) Hive compares to HBase in terms of performance. The SQL-like interface used by Hive is very much preferable to the HBase API we have implemented. ...

C#.NET Importing a registry hive and parsing its contents

I have been given a .Hive file from a registry which i have to parse and use the contents as part of a html report(from this i assume i have to convert to text somehow). The whole thing must be done within the program so i cant just convert the hive file and then run it through my program. I currently have no idea how to even start this ...

how to design Hbase schema ?

Hi all suppose that I have this RDBM table (Entity-attribute-value_model): col1: entityID col2: attributeName col3: value and I want to use HBASe sue to scaling issues. I know that the only way to access Hbase table is using a primary key (cursor). you can get a cursor for a specific key, and iterate the rows one-by-one . The issue...

hadoop hive question

I'm trying to create tables pragmatically using JDBC. However, I can't really see the table I created from the hive shell. What's worse, when i access hive shell from different directories, i see different result of the database. Is any setting i need to configure? Thanks in advance. ...

copy resultSet without using cachedRowSet

I 'm trying to close the connection after executing a query. Before, I just create a CacheRowSetImpl instance and it will take care of release the resources for me. However, I am using hive database driver from hadoop project. It doesn't support CachedRowSetImpl.execute(). I'm wondering is there any other way that allow me to copy the re...

Registry hive question...

Hello, Does anyone have a smal example of how to programmatically, in c/c++, load a users registry hive? I would loike to load a hive set some values and close the hive. Thanks in advance for any help. Tony ...

Can OLAP be done in BigTable?

In the past I used to build WebAnalytics using OLAP cubes running on MySQL. Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip,...

Building Apache Hive - impossible to resolve dependencies

I am trying out the Apache Hive as per http://wiki.apache.org/hadoop/Hive/GettingStarted and am getting this error from Ivy: Downloaded file size doesn't match expected Content Length for http://archive.apache.org/dist/hadoop/core/hadoop-0.19.0/hadoop-0.19.0.tar.gz. Please retry. This error repeats 4 times for 4 different versions of ...

How do I make an async call to Hive in Java?

I would like to execute a Hive query on the server in an asynchronous manner. The Hive query will likely take a long time to complete, so I would prefer not to block on the call. I am currently using Thirft to make a blocking call (blocks on client.execute()), but I have not seen an example of how to make a non-blocking call. Here is the...

Hadoop Hive web interface options

I've been experimenting with Hive for some data mining activities and would like to make it easily available to less command line orientated colleagues. Hive does now ship with a web interface (http://wiki.apache.org/hadoop/Hive/HiveWebInterface) but it's very basic at this stage. My question is does a visually polished and fully featu...

Using Hadoop, are my reducers guaranteed to get all the records with the same key?

I'm running a hadoop job (using hive actually) which is supposed to uniq lines in a lot of text file. More specifically it chooses the most recently timestamped record for each key in the reduce step. Does hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if there are many r...

Hive Based Registry in Flash

To start with I'll say I've read the post here and I'm still having trouble. I'm trying to create a CE6 image with a hive-based registry that actually stores results through a reboot. I've ticked the hive settings in the catalog items. In common.reg, I've set the location of the hive ([HKEY_LOCAL_MACHINE\init\BootVars] "SystemHive") ...

How can I use Hive on top of Amazon Elastic Mapreduce to process data in Amazon Simple DB?

I have a lot of data in an Amazon Simple DB Domain. I want to start Hive on Elastic Map Reduce (on top of hadoop) and somehow, either import data from simpledb or, connect to simpledb and run hiveql queries on it. I have having issues importing the data. Any pointers? ...

Even data distribution on hadoop/hive

I am trying a small hadoop setup (for experimentation) with just 2 machines. I am loading about 13GB of data, a table of around 39 million rows, with a replication factor of 1 using Hive. My problem is hadoop always stores all this data on a single datanode. Only if I change the dfs_replication fatcor to 2 using setrep, hadoop copies dat...

Difference between Pig and Hive? Why have both?

Hi My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS. I understand that- Pig's language Pig Latin is a shift from(suits the way programmers think) SQL like declarative style of programming and Hive's query language closely ...

How to connect to Hadoop/Hive from .NET

I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can't find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not f...

Spring-Batch for a massive nightly / hourly Hive / MySQL data processing

I'm looking into replacing a bunch of Python ETL scripts that perform a nightly / hourly data summary and statistics gathering on a massive amount of data. What I'd like to achieve is Robustness - a failing job / step should be automatically restarted. In some cases I'd like to execute a recovery step instead. The framework must be ab...

Combine multiple rows into one space separated string

So I have 5 rows like this userid, col -------------- 1, a 1, b 2, c 2, d 3, e How would I do query so it will look like this userid, combined 1, a b 2, c d 3, e ...

Using Hive with Pig

Hi, My hive query has multiple outer joins and takes very long to execute. I was wondering if it would make sense to break it into multiple smaller queries and use pig to work the transformations. Is there a way I could query hive tables or read hive table data within a pig script? Thanks ...

writing custom functions that use external java classes on Hive

I've been thinking of how to do it in Hive. For e.g. i've a specific field in a log file that I want to extract (this is already possible in Hive) and then I want to map this field's value to something else. This mapping is determiened by own customic business logic that is coded up in a Java Class. How can I use this Java class in Hive...