I want to process the logs from my web server as it comes in using Hadoop (Amazon Elastic mapreduce). I googled for help but nothing useful. I would like to know if this can be done or is there any alternative way to do this.
A:
Hadoop is usually used in an offline manner. So I would rather process the logs periodically.
In a project I was involved with previously, we made our servers produce log files that were rotated hourly (every hour at x:00). We had a script that ran hourly (every hour at x:30) uploaded the files into HDFS (those that weren't already there). Then you can run jobs as often as you like in Hadoop to process these files.
I am sure there are better real-time alternatives too.
mojbro
2010-03-08 20:09:51
A:
Hadoop is not used for live real time processing. But it can be used to process logs on hourly basis may be one hour behind which is near real time. I wonder what is the need of processing logs as it comes.
Harsha Hulageri
2010-07-11 04:18:14