views:

1588

answers:

3

I'm trying to write a log parsing script to extract failed events. I can pull these with grep:

$ grep -A5 "FAILED" log.txt

2008-08-19 17:50:07 [7052] [14] DEBUG:      data: 3a 46 41 49 4c 45 44 20 20 65 72 72 3a 30 32 33   :FAILED  err:023
2008-08-19 17:50:07 [7052] [14] DEBUG:      data: 20 74 65 78 74 3a 20 00                            text: .
2008-08-19 17:50:07 [7052] [14] DEBUG:    Octet string dump ends.
2008-08-19 17:50:07 [7052] [14] DEBUG: SMPP PDU dump ends.
2008-08-19 17:50:07 [7052] [14] DEBUG: SMPP[test] handle_pdu, got DLR
2008-08-19 17:50:07 [7052] [14] DEBUG: DLR[internal]: Looking for DLR smsc=test, ts=1158667543, dst=447872123456, type=2
--
2008-08-19 17:50:07 [7052] [8] DEBUG:      data: 3a 46 41 49 4c 45 44 20 20 65 72 72 3a 30 32 34   :FAILED  err:024
2008-08-19 17:50:07 [7052] [8] DEBUG:      data: 20 74 65 78 74 3a 20 00                            text: .
2008-08-19 17:50:07 [7052] [8] DEBUG:    Octet string dump ends.
2008-08-19 17:50:07 [7052] [8] DEBUG: SMPP PDU dump ends.
2008-08-19 17:50:07 [7052] [8] DEBUG: SMPP[test] handle_pdu, got DLR
2008-08-19 17:50:07 [7052] [8] DEBUG: DLR[internal]: Looking for DLR smsc=test, ts=1040097716, dst=447872987654, type=2

What I'm interested in is, for each block, the error code (i.e. the "023" part of ":FAILED err:023" on the first line) and the dst number (i.e."447872123456" from "dst=447872123456" on the last line.)

Can anyone help with a shell one-liner to extract those two values, or provide some hints as to how I should approach this?

+2  A: 
grep -A 5 FAILED log.txt | \              # Get FAILED and dst and other lines
    egrep '(FAILED|dst=)' | \             # Just the FAILED/dst lines
    egrep -o "err:[0-9]*|dst=[0-9]*" | \  # Just the err: and dst= phrases
    cut -d':' -f 2 | \                    # Strip "err:" from err: lines
    cut -d '=' -f 2 | \                   # Strip "dst=" from dst= lines
    xargs -n 2                            # Combine pairs of numbers

023 447872123456
024 447872987654

As with all shell "one"-liners, there is almost certainly a more elegant way to do this. However, I find the iterative approach very successful for getting what I want: start with too much information (your grep), then narrow down the lines I want (with grep), then snip out the parts of each line that I want (with cut).

While using the linux toolbox takes more lines, you only have to know the basics of a few commands to do just about anything you want. An alternative is to use awk, python, or other scripting languages, which require more specialized programming knowledge but will take less screen space.

Michael Gundlach
Be aware that the grep command used is not standard - it uses the GNU-only feature '-A 5'; likewise, the egrep -o option is not standard. That's a warning - not a huge problem (unless you don't use GNU grep/egrep).
Jonathan Leffler
A: 

A simple solution in Ruby, here is filter.rb:

#! /usr/bin/env ruby
File.read(ARGV.first).scan(/:FAILED\s+err:(\d+).*?, dst=(\d+),/m).each do |err, dst|
  puts "#{err} #{dst}"
end

Run it with:

ruby filter.rb my_log_file.txt

And you get:

023 447872123456
024 447872987654
bltxd
A: 

If there is always the same number of fields you could just

grep -A5 "FAILED" log.txt | awk '$24~/err/ {print $24} $12~/dst/{print $12}' error.txt

err:023
dst=447872123456,
err:024
dst=447872987654,

And depending on how the rest of the file looks you might be able to skip the grep all togther.

The "$24~/err/ {print $24}" part tells awk to print field number 24 if it contains err, ~/XXX/ where XXX is a regular expression.