views:

220

answers:

5

I wish to export from multiple nodes log files (in my case apache access and error logs) and aggregate that data in batch, as a scheduled job. I have seen multiple solutions that work with streaming data (i.e think scribe). I would like a tool that gives me the flexibility to define the destination. This requirement comes from the fact that I want to use HDFS as the destination.

I have not been able to find a tool that supports this in batch. Before re-creating the wheel I wanted to ask the StackOverflow community for their input.

If a solution exists already in python that would be even better.

A: 

PiCloud may help.

UsAaR33
A: 

take a look at Zomhg, its an aggregation/reporting system for log files using Hbase and Hdfs: http://github.com/zohmg/zohmg

alex
A: 

You can develop your custom application, if viable, see the project ApacheLog to help you do parse of log lines.

Felipe Cardoso Martins
A: 

Scribe can meet your requirements, there's a version (link) of scribe that can aggregate logs from multiple sources, and after reaching given threshold it stores everything in HDFS. I've used it and it works very well. Compilation is quite complicated, so if you have any problems ask a question.

Wojtek
+1  A: 

we use http://mergelog.sourceforge.net/ to merge all our apache logs..

Doon