views:

32

answers:

4

Hello everyone,

I'm trying to make a filter on script to make this happen:

Before:

123.125.66.126 - - [05/Apr/2010:09:18:12 -0300] "GET / HTTP/1.1" 302 290
66.249.71.167 - - [05/Apr/2010:09:18:13 -0300] "GET /robots.txt HTTP/1.1" 404 290
66.249.71.167 - - [05/Apr/2010:09:18:13 -0300] "GET /~leonardo_campos/IFBA/Web_Design_Aula_17.pdf HTTP/1.1" 404 324

After:

[05/Apr/2010:09:18:12 -0300] / 302 290
[05/Apr/2010:09:18:13 -0300] /robots.txt 404 290
[05/Apr/2010:09:18:13 -0300] /~leonardo_campos/IFBA/Web_Design_Aula_17.pdf 404 324

If someone could help it would be great...

Thanks in advance !

+1  A: 

It seems a perfect work for "sed".

You can easily construct a pair of "s" replacement patterns to remove the unwanted pieces of lines.

andcoz
+1  A: 

sed is your friend here, with regexps.

sed 's/^\(\[.*\]\) "GET \(.*\) .*" \(.*\)$/\1 \2 \3/'
Didier Trosset
+1  A: 

Supporting all HTTP methods:

sed 's#.*\(\[[^]]*\]\).*"[A-Z]* \(.*\) HTTP/[0-9.]*" \(.*\)#\1 \2 \3#'
phihag
Chers mate...It works fine !
Alucard
+1  A: 

if your file structure is always like that, you can just use fields. no need complex regex

$ awk '{print $4,$5,$7,$9,$10}' file
[05/Apr/2010:09:18:12 -0300] / 302 290
[05/Apr/2010:09:18:13 -0300] /robots.txt 404 290
[05/Apr/2010:09:18:13 -0300] /~leonardo_campos/IFBA/Web_Design_Aula_17.pdf 404 324
ghostdog74