Can someone tell me what I am doing wrong?
2009/08/10 11:33:07 [INFO] - Copying local:/ X/Y/Z.txt to DFS:/X/Y/Z.txt
2009/08/10 11:33:07 [INFO] - put:
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied: user=superman, access=WRITE, inode="":big-build:supergroup:rwxr-xr-x
2009/08/10 11:33:08 [FATAL] - DFS error...
2009/08/11 13:25:39 [INFO] - put: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=yskhoo, access=WRITE, inode="":bad-boy:supergroup:rwxr-xr-x
Why do I keep getting this error? Also is it bad that I am writing to a blank inode?
...
I'm the administrator for a company intranet and I'd like to start producing videos. However, we have a very small bandwidth tunnel between our locations, and I'd like to avoid hogging it by streaming videos by multiple users.
I'd like to synchronize the files to servers at each of the locations. Then I'd like the browser (or the intran...
I've been working with the Apache Mahout machine learning libaries in my free time a bit over the past few weeks. I'm curious to hear about how others are using these libraries.
...
Assume I have the following input in Pig:
some
And I would like to convert that into:
s
so
som
some
I've not (yet) found a way to iterate over a chararray in pig latin. I have found the TOKENIZE function but that splits on word boundries.
So can "pig latin" do this or is this something that requires a Java class to do that?
...
How do I wipe out the DFS in Hadoop?
...
Can somebody outline the various differences between the various Hadoop Distributions available:
Cloudera - http://www.cloudera.com/hadoop
Yahoo - http://developer.yahoo.net/blogs/hadoop/
using the Apache Hadoop distro as a baseline.
Is there a good reason to using one of these distributions over the standard Apache Hadoop distro?...
In the past I used to build WebAnalytics using OLAP cubes running on MySQL.
Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip,...
What is the most efficient way to look up values in a BDB for several files in parallel? If I had a Perl script which did this for one file at a time, would forking/running the process in background with the ampersand in Linux work?
How might Hadoop be used to solve this problem?
Would threading be another solution?
...
Given the following HBase schema scenario (from the official FAQ)...
How would you design an Hbase table
for many-to-many association between
two entities, for example Student and
Course?
I would define two tables:
Student: student id student data
(name, address, ...) courses (use
course ids as column qualifiers h...
I am working on a project using Hadoop and it seems to natively incorporate Java and provide streaming support for Python. Is there is a significant performance impact to choosing one over the other? I am early enough in the process where I can go either way if there is a significant performance difference one way or the other.
...
Does anyone have any familiarity with working with both CloudStore and HDFS. I am interested to see how far CloudStore has been scaled and how heavily it has been used in production. CloudStore seems to be more full featured than HDFS. When thinking about these two filesystems what practical trade offs are there?
...
The Task Side-Effect Files section of the Hadoop tutorial mentions using the "attemptid" of the task as a unique name. How do I get this attempt ID in my mapper or reducer?
...
Do you need to set up a Linux cluster first in order to setup a Hadoop cluster ?
...
I'm thinking about building a small testing application in hadoop to get the hang of the system.
The application I have in mind will be in the realm of doing statistics.
I want to have "The 10 worst values for each key" from my reducer function (where I must assume the possibility a huge number of values for some keys).
What I have pla...
Lots of "BAW"s (big ass-websites) are using data storage and retrieval techniques that rely on huge tables with indexes, and using queries that won't/can't use JOINs in their queries (BigTable, HQL, etc) to deal with scalability and sharding databases. How does that work when you have lots and lots of data that is very related?
I can on...
I need to write data in to Hadoop (HDFS) from external sources like a windows box. Right now I have been copying the data onto the namenode and using HDFS's put command to ingest it into the cluster. In my browsing of the code I didn't see an API for doing this. I am hoping someone can show me that I am wrong and there is an easy way to ...
Hi,
I want to know what Hadoop is ? I have gone through Google and Wikipedia but I am not clear of what actually Hadoop is and what is the goal of it.
Any useful information would be highly appreciated.
Note: Please do not provide link to wiki as I have read it but am looking for detail explanation.
Thanks.
...
have written a stochastic simulation in Java, which loads data from a few CSV files on disk (totaling about 100MB) and writes results to another output file (not much data, just a boolean and a few numbers). There is also a parameters file, and for different parameters the distribution of simulation outputs would be expected to change. T...
Is it possible to run Hadoop so that it only uses spare CPU cycles? I.e. would it be feasible to install Hadoop on peoples work machines so that number crunching can be done when they are not using their PCs, and they wouldn't experience an obvious performance drain (whurring fans aside!).
Perhaps it's just be a case of setting the JVM...