views:

111

answers:

5

I would like to replace the first character 'x' with the number '7' on every line of a log file using a shell script. Example of the log file:

216.129.119.x [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.x [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.x [01/Mar/2010:00:27:10 +0100] "GET /etc/....

My humble beginnings...

#!/bin/bash
echo Starting script...
cd /Users/me/logs/
gzip -d /Users/me/logs/access.log.gz
echo Files unzipped...
echo I'm totally lost here to process the log file and save it back to hd...

exit 0

Why is the log file IP malformed like this? My web provider (1and1) has decide not to store IP address, so they have replaced the last number with the character 'x'. They told me it was a new requirement by 'law'. I personally think that is bs, but that would take us off topic.

I want to process these log files with AWstats, so I need an IP address that is not malformed. I want to replace the x with a 7, like so:

216.129.119.7 [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.7 [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.7 [01/Mar/2010:00:27:10 +0100] "GET /etc/....

Not perfect I know, but least I can process the files, and I can still gain a lot of useful information like country, number of visitors, etc. The log files are 200MB each, so I thought that a shell script is the way to go because I can do that rapidly on my Macbook Pro locally. Unfortunately, I know very little about shell scripting, and my javascript skills are not going to cut it this time. I appreciate your help.

+1  A: 

The following perl one-liner should do the trick:

perl -p -i -e 's/\.x/\.7/' foo.log

It'll substitute the first instance of '.x' with '.7' on each line of the log file.

seejay
should't you escape the dot?
João Portela
should have and did. but I needed to double-escape it to appear in the post correctly. thanks for the pointer
seejay
This worked a treat. I did try all the solutions, but this was fast and fitted in my script directly. It done a 240MB log in 25 seconds.
skymook
A: 

You can use this little python script (which could probably be written in fewer lines than this):

import sys
for line in sys.stdin:
    ip_number, rest = line.split(' ', 1)
    ip_parts = ip_number.split('.')
    ip_parts[3] = '7'
    ip_number = '.'.join(ip_parts)
    print ip_number, rest,

Save it as fixip.py and execute it as:

cat access.log | python fixip.py > output.txt
mojbro
Even though I used the correct command "cat access.log | python fixip.py" this output to my terminal and not to a file as I need.
skymook
In Unix, where "everything is a file", so you can redirect the output directly into a file with the ">" directive. I'll update the command above.
mojbro
Thanks for the update.
skymook
+2  A: 

while i don't know what's the purpose of putting "7" in every IP because that's inaccurate, nevertheless, here's an awk one-liner

$ awk '{sub(/x$/,7,$1)}1' file
216.129.119.7 [01/Mar/2010:00:25:20 +0100] "GET /etc/....
74.131.77.7 [01/Mar/2010:00:25:37 +0100] "GET /etc/....
222.168.17.7 [01/Mar/2010:00:27:10 +0100] "GET /etc/....
ghostdog74
The purpose of putting a 7 at the end of every IP is because my ISP has removed the last part of the IP and put an x in place.I think that most ISPs would have blocks of 256 for their customers. The chances of visitors in the same country coming from the same ISP block range are slim. 216.129.119.7 does come from the same country as 216.129.119.38, in this case USA. I like the number 7, it's as good as any other number I could use? ;)
skymook
@skymook I would have chosen `255` ;-)
Sinan Ünür
A: 

Python (run with file to process as the first argument):

import sys
import gzip

fin = gzip.GzipFile(sys.argv[1], 'r')
fout = gzip.GzipFile(sys.argv[1] + '.new', 'w', 9)

for line in fin:
  address, rest = line.split(' ', 1)
  prefix, node = address.rsplit('.', 1)
  fout.write('%s.7 %s' % (prefix, rest))

fin.close()
fout.close()
Ignacio Vazquez-Abrams
+2  A: 

since everyone is posting their alternative solutions i'm going to post one that i think is very simple:

sed s/\.x/\.7/ input_file > output_file

replace any string ".x" by ".7"

hope it helps! :)

João Portela