tags:

views:

894

answers:

3

Hi, all. I have a huge log file that has a structure like this:

ip=X.X.X.X
userAgent=Firefox
-----
Referer=hxxp://www.bla.org

I want to create a custom output like this: ip:userAgent

for ex:

X.X.X.X:Firefox

and the pattern will ignore lines which don't start with ip= and userAgent=. (these two must form a pair as i mentioned above.)

I am a newbie administrator and our client needs a sorted file immediately. Any help will be wonderful. Thanks.

A: 

You can use:

^ip=((?:[0-9]{1,3}\.){3}[0-9]{1,3})$

And

^userAgent=(.*)$

Get the group 1 for both and you will have the desired data.

José Leal
+3  A: 
^ip=(\d+(?:\.\d+){3})[\r\n]+userAgent=(.+)$

Apply in global + multiline mode.

Group 1 will contain the IP, group 2 will contain the user agent string.

Edit: The above expression can be simplified a bit, we can remove the IP address format checking - assuming that there will be nothing but real IP addresses in the log file:

^ip=(\d+\.?)+[\r\n]+userAgent=(.+)$
Tomalak
A: 

give it a try (this is in no way robust if there are lines where your log file differs from the example snippet above):

sed -n -e '/^ip=/ {s///
N
s/\nuserAgent=/:/
p 
}' HugeFile > customoutput
Zac Thompson