getting data in and out of hadoop

tags:

hadoop

views:

155

answers:

getting data in and out of hadoop

I need a system to analyze large log files. A friend directed me to hadoop the other day and it seems perfect for my needs. My question revolves around getting data into hadoop-

Is it possible to have the nodes on my cluster stream data as they get it into HDFS? Or would each node need to write to a local temp file and submit the temp file after it reaches a certain size? and is it possible to append to a file in HDFS while also running queries/jobs on that same file at the same time?

HDFS does not support appends (yet?)

What I do is run the map-reduce job periodically and output results to an 'processed_logs_#{timestamp}" folder. Another job can later take these processed logs and push them to a database etc. so it can be queried on-line

Eran Kampf 2009-07-11 02:44:18

+1 A:

A hadoop job can run over multiple input files, so there's really no need to keep all your data as one file. You won't be able to process a file until its file handle is properly closed, however.

toluju 2009-07-21 04:51:07

I'd recommend using Flume to collect the log files from your servers into HDFS.

Jeff Hammerbacher 2010-10-04 11:54:48

ansaurus

tags:

views:

answers:

getting data in and out of hadoop

related questions