I have some HUGE log files (50Mb; ~500K lines) I need to start filtering some of the crap out of. The log files are being produced using log4j and have the basic pattern of:
[log-level] date-time class etc, etc
log-message
I'm looking for a way that I can identify a regex start and regex end (or something similar) that will filter out the matching entries from the file so I can more easily wade through these massive files. My thoughts are that the start regex would be the log-level and the end regex would be something in the log-message. I'm sure I could write a java program to accomplish this task, but I thought I'd ask the community before going down that path. Thanks in advance.
Let me expand on my question. Let's assume I have the following snippet in my log file:
[DEBUG] date-time class etc, etc
log-message-1
[WARN] date-time class etc, etc
log-message-2
[DEBUG] date-time class etc, etc
log-message-3
[DEBUG] date-time class etc, etc
log-message-1
[WARN] date-time class etc, etc
log-message-2
[DEBUG] date-time class etc, etc
log-message-6
I'd like a way to filter out logEntry1 and logEntry2 so I end up with:
[DEBUG] date-time class etc, etc
log-message-3
[DEBUG] date-time class etc, etc
log-message-6
I would hope to accomplish this be defining some sets of regex patterns pairs. In my example above, I'd want to define a pair for logEntry1 and another for logEntry2.
I hope that helps clarify my question.