tags:

views:

46

answers:

2

Hi,

I would like to parse log files, I need to get only last IP from one or many divided by comma on beginning of the line:

This is how the lines look like:

80.250.5.1 - - [26/Oct/2010:13:10:14 +0200] ...
80.250.5.1, 80.250.5.2 somethingA - [26/Oct/2010:13:10:14 +0200] ...
80.250.5.1, 80.250.5.2, 80.250.5.3 - somethingB [26/Oct/2010:13:10:14 +0200] ...

I need to get:

80.250.5.1 - - [26/Oct/2010:13:10:14 +0200] ...
80.250.5.2 somethingA - [26/Oct/2010:13:10:14 +0200] ...
80.250.5.3 - somethingB [26/Oct/2010:13:10:14 +0200] ...

Note: There is never comma in somethingA and somethingB columns, this my help. There may be more commas in next columns after the [date].

I have tried to test few first columns and delete them if there is comma in it, but the problem is that sometimes there are more than 10 IPs there.

This works for 2 IPs:

awk '{if ($1 ~ /,/) {$1=""}; if ($2 ~ /,/) {$2=""}  }1'

My idea is to do something like "if there is comma before [, delete everything before comma, otherwise keep it unchanged". Unfortunately, my sed/awk skills are not good enough to do this.

Thanks a lot for any help.

+1  A: 
sed -r 's/^(([0-9]+\.){3}[0-9]+, )*(.*)$/\3/'

([0-9]+\.){3}[0-9]+) captures an IP address.

([0-9]+\.){3}[0-9]+, )* repeats capturing until there are no more addresses followed by a comma left, which means that the rest of the line is exactly what we need (please note that the last (or only) address is not followed by the comma).

The last step is to instruct sed to replace a whole input line with what it has captured in the third group of brackets (hence \3 at the end of the expression), which gives us a desired result.

Igor Korkhov
Thank you Igor, this works, howerer I would need to restrict it to replace commas only until first occurence of [, the rest of line can contain commas that I need to keep unchanged.
Martin
Now it must work properly, I hope
Igor Korkhov
Sorrt, it doesn't replace anything now.
Martin
Hm, I don't think there should be an asterisk before {3}...
Igor Korkhov
You are right, that asterisk made it work, but changed also the rest of line.
Martin
I have modified your first reply sed -r "s/^(.*,)? (.*)$/\2/" to sed -r "s/^(.*,)? (.*\[)/\2/" and this seems to work. If you edit your post, I will approve it as accepted answer.
Martin
It should be sed -r "s/^(.*,)? (.*\[)/\2/"
Martin
Unfortunately it'll fail on the following line "1.2.3.4, 1.2.3.5 - somethingB [26/Oct/2010:13:10:14 +0200] bla, bla, [xxx]", that is, if there is a comma followed by a square bracket somwhere at the end of the line, while my last expression should work well.
Igor Korkhov
Thank you for the working code and great explanation of it!
Martin
a bit simpler: `sed -r 's/^([0-9.]+, )+//'`
glenn jackman
I like this compromise between Igor's and Glenn's: `sed -r 's/^(([0-9]+\.){3}[0-9]+, )+$//'`
Dennis Williamson
I myself prefere Glenn's solution because it's very simple, clean and it does its job in this particular scenario. My solution is overcomplicated for this task. Flagged Glenn's expression :)
Igor Korkhov
A: 

Are there any other commas in the line? If not, you can do:

awk -F, '{ print $NF }'

This will leave leading whitespace than you can trim away if desired, using either of these:

awk -F, '{ print $NF }' | sed 's/^ *//'
awk -F, '{ print gensub(/^ */, "", "G", $NF) }'

In awk, the built-in variable NF returns the number of fields on the input line, so printing $NF will print the last field in the line. Thus if there are more commas on the input line, you won't get the output you want.

Note that the use of single quotes is critical (don't use double quotes otherwise $NF is expanded by the shell rather than passed through to awk).

Chris J
Thank you, but there can be more commas in the rest of the line after the [date] column. I have updated this in the question, sorry.
Martin