tags:

views:

52

answers:

4

Hello,

I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do it using hadoof fs commands or Pig?

Thanks!

A: 
hadoop fs -getmerge <dir_of_input_files> <mergedsinglefile>
Harsha Hulageri
A: 

Thanks Harsha, But can the destination file for getmerge be within the DFS? From what I understand, the destination has to be the local file system

uHadoop
A: 

okay...I figured out a way using hadoop fs commands -

hadoop fs -cat [dir]/* | hadoop fs -put - [destination file]

It worked when I tested it...any pitfalls one can think of?

Thanks!

uHadoop
A: 

You can use the tool HDFSConcat, new in HDFS 0.21, to perform this operation without incurring the cost of a copy.

Jeff Hammerbacher
Thanks Jeff, will look into HDFSConcat. Currently we are on 0.20.2 so I am now creating a Har of all the files and then reading from pig. This way data stays in HDFS.
uHadoop
I should note that this tool has limitations highlighted at https://issues.apache.org/jira/browse/HDFS-950. Files must have the same block size and be owned by the same user.
Jeff Hammerbacher