ansaurus

Question

Shell script/regex: extraction across multiple lines

Answer 1

+2 A:

grep -A 5 FAILED log.txt | \              # Get FAILED and dst and other lines
    egrep '(FAILED|dst=)' | \             # Just the FAILED/dst lines
    egrep -o "err:[0-9]*|dst=[0-9]*" | \  # Just the err: and dst= phrases
    cut -d':' -f 2 | \                    # Strip "err:" from err: lines
    cut -d '=' -f 2 | \                   # Strip "dst=" from dst= lines
    xargs -n 2                            # Combine pairs of numbers

023 447872123456
024 447872987654

As with all shell "one"-liners, there is almost certainly a more elegant way to do this. However, I find the iterative approach very successful for getting what I want: start with too much information (your grep), then narrow down the lines I want (with grep), then snip out the parts of each line that I want (with cut).

While using the linux toolbox takes more lines, you only have to know the basics of a few commands to do just about anything you want. An alternative is to use awk, python, or other scripting languages, which require more specialized programming knowledge but will take less screen space.

Michael Gundlach 2008-10-30 16:07:52

Be aware that the grep command used is not standard - it uses the GNU-only feature '-A 5'; likewise, the egrep -o option is not standard. That's a warning - not a huge problem (unless you don't use GNU grep/egrep).

Jonathan Leffler 2008-10-30 18:40:52

Answer 2

A:

A simple solution in Ruby, here is filter.rb:

#! /usr/bin/env ruby
File.read(ARGV.first).scan(/:FAILED\s+err:(\d+).*?, dst=(\d+),/m).each do |err, dst|
  puts "#{err} #{dst}"
end

Run it with:

ruby filter.rb my_log_file.txt

And you get:

023 447872123456
024 447872987654

bltxd 2008-10-30 16:11:20

Answer 3

A:

If there is always the same number of fields you could just

grep -A5 "FAILED" log.txt | awk '$24~/err/ {print $24} $12~/dst/{print $12}' error.txt

err:023
dst=447872123456,
err:024
dst=447872987654,

And depending on how the rest of the file looks you might be able to skip the grep all togther.

The "$24~/err/ {print $24}" part tells awk to print field number 24 if it contains err, ~/XXX/ where XXX is a regular expression.

2008-11-05 18:31:40

ansaurus

tags:

views:

answers:

Shell script/regex: extraction across multiple lines

related questions