tags:

views:

155

answers:

3

I need a system to analyze large log files. A friend directed me to hadoop the other day and it seems perfect for my needs. My question revolves around getting data into hadoop-

Is it possible to have the nodes on my cluster stream data as they get it into HDFS? Or would each node need to write to a local temp file and submit the temp file after it reaches a certain size? and is it possible to append to a file in HDFS while also running queries/jobs on that same file at the same time?

A: 

HDFS does not support appends (yet?)

What I do is run the map-reduce job periodically and output results to an 'processed_logs_#{timestamp}" folder. Another job can later take these processed logs and push them to a database etc. so it can be queried on-line

Eran Kampf
+1  A: 

A hadoop job can run over multiple input files, so there's really no need to keep all your data as one file. You won't be able to process a file until its file handle is properly closed, however.

toluju
A: 

I'd recommend using Flume to collect the log files from your servers into HDFS.

Jeff Hammerbacher