views:

242

answers:

5

I have the same problem as somebody described in another post. My application's log files are huge (~1GB), and grep is tedious to use to correlate information from the log files. Right now I use the ''less'' tool, but it is also slower than I would like.

I am thinking of speeding up the search. There are the following ways to do this: first, generate logs in XML and use some XML search tool. I am not sure how much speedup will be obtained using XML search (not much I guess, since non-indexed file search will still take ages).

Second, use an XML database. This would be better, but I don't have much background here.

Third, use a (non-XML) database. This would be somewhat tedious since the table schema has to be written (has it to be done for second option above too?). I also foresee the schema to change a lot at the start to include common use cases. Ideally, I would like something lighter than a full-fledged database for storing the logs.

Fourth, use lucene. It seems to fit the purpose, but is there a simple way to specify the indexes for the current use case? For example, I want to say "index whenever you see the word 'iteration'".

What is your opinion?

+7  A: 

The problem is using XML will make your log file even bigger I would suggest either splitting up your log files by date or lines otherwise use file based database engines such as sqlite

oykuo
+1  A: 

If you can check your logs on Windows, or using Wine, LogParser is a great tool to mine data out of logs, it practically allows you to run SQL queries on any log, with no need to change any code or log formats, and it can even be used generate quick HTML or excel reports.

Also a few years ago, when XML was in the hype I was using XML logs, and XSLT stylesheets to produce views, it was actually kinda nice, but it used way to much memory and it would choke on large files, so you probably DON'T want to use XML.

Robert Gould
I saw the MS Log Parser. It would be ideal if it were open source/ available on linux (i.e. without wine).
Amit Kumar
I agree, I'd love it to be OpenSource, but alas it's not, anyways considering what it can do wine could actually be viable, atleast it was in my case
Robert Gould
+3  A: 

A gigabyte isn't that big, really. What kind of "correlation" are you trying to do with these log files? I've often found it's simpler to write a custom program (or script) to handle a log file in a particular way than it is to try to come up with a database schema to handle everything you'll ever want to do with it. Of course, if your log files are hard to parse for whatever reason, it may well be worth trying to fix that aspect.

(I agree with kuoson, by the way - XML is almost certainly not the way to go.)

Jon Skeet
About correlation: The AI application uses heuristics based on random bits. I often have to find out why a certain type of change happened, and why. Then I go about correcting/improving the why.
Amit Kumar
A: 

Maybe you could load your log into Emacs (provided you have sufficient memory) and use the various Emacs features such as incremental search and Alt-X occur.

Disclaimer: I haven't tried this on files > 100MB.

starblue
I do it with vim and I have to wait for about a minute or so just for the file to open.
Amit Kumar
A: 

XML could be even worse in many cases, it only is good solution for smaller volumes of data. I think you can try to structure the log data coming from your application, so it can be used for fast searches. Try using logFaces to store the log, it will then allow instant queries of data, of course it depends on the nature of your application, but most of the use cases are covered. To name just a few - query by time range, severity, class name, message text, host name or application name. Besides, you can use stronger database if your storage is going to grow. If queries are not enough, you can at least decrease the amount of interesting events and then use text crunchers further down.

Dima