I'm writing a python script to extract data out of our 2GB Apache access log. Here's one line from the log.
81.52.143.15 - - [01/Apr/2008:00:07:20 -0600] "GET /robots.txt HTTP/1.1" 200 29 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (http://www.voila.com/)"
I'm trying to get the date portion from that line, and regex is failing me, and I'm not sure why. Here's my python code:
l = 81.52.143.15 - - [01/Apr/2008:00:07:20 -0600] "GET /robots.txt HTTP/1.1" 200 29 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (http://www.voila.com/)"
re.match(r"\d{2}/\w{3}/\d{4}", l)
returns nothing. Neither do the following:
re.match(r"\d{2}/", l)
re.match(r"\w{3}", l)
or anything else I can thing of to even get part of the date. What am I misunderstanding?