tags:

views:

717

answers:

6

I have a huge file on my unix server from which I need to extract certain parts

The format of the line is

aNumber timestamp commandInformation

I use the command

grep LATENCY file.log | grep CMDTYPE=NEW

to filter out certain lines that I want. I only want the part timestamp and the last 9 characters from the line to be returned, not the complete line. How can I do that?

+2  A: 

Use awk(1):

awk ' { print $2" "substr($0,length($0)-8) }'
Aaron Digulla
+1 for being faster than me :)
Paolo Tedesco
A: 

You can use awk as follows:

grep LATENCY file.log | grep CMDTYPE=NEW | awk '{print $2,substr($0,length($0)-9,9)}'
Paolo Tedesco
Wouldn't using $0 ensure you display the end of the line instead of just the end of the 3rd word?
Yannick M.
@Yannick: yes, I assumed for no reason that 'commandInformation' would be just one word. Fixed the post with your suggestion, thanks!
Paolo Tedesco
A: 

No need to use grep, awk can do that as well:

awk '/LATENCY/ && /CMDTYPE=NEW/ {print $2 " " substr($0, length($0)-8)}' file
Hai Vu
A: 

I'm going to argue perl is a better choice than awk here:

perl -ne 'next if ! (/LATENCY|CMDTYPE=NEW/ && /^\d+.*\s+(.*)\s+.*(.{9})$/); print "$2 $3\n";'

The regex is more robust, allowing you to omit lines that don't match the stricter pattern. The awk scripts above are going to see overflows in the substr call (I honestly dont' know what negative indices do in awk) if you feed it broken input like partial lines from the end of a log.

Andy Ross
A: 

You can do everything with sed alone:

$ echo "234432 12:44:22.432095 LATENCY blah CMDTYPE=NEW foo bar 123456789" | \
sed -n '/LATENCY/!b;/CMDTYPE=NEW/!b;s/^.\+\s\+\([0-9:.]\+\)\s.\+\(.........\)$/\1 \2/; p'
12:44:22.432095 123456789
Idelic
A: 

cut must do the job

grep something somewhere | grep againsomething | cut -f2 -d' '
Balakrishnan