tags:

views:

157

answers:

4

Hy im trying to split this string in PHP.

11.11.11.11 - - [25/Jan/2000:14:00:01 +0100] "GET /1986.js HTTP/1.1" 200 932 "http://domain.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"

How can split this in to ip,date,http method domainname and Browser ?

+2  A: 

HTTP server logs are like space-delimited CSV, so you could use a CSV parsing function:

$parts = str_getcsv($log_line, " ", '"', '\\')

From the docs:

array str_getcsv(
  string $input
  [, string $delimiter = ','  
  [, string $enclosure = '"'  
  [, string $escape = '\\'
  ]]]
)

(I'm not sure if the $escape part really is a backslash. Use the proper escape sequence.)

Tomalak
this is not true, the date field contains spaces
KARASZI István
But these fields are wrapped in double quotes. This is why the CSV function accepts an `$enclosure` parameter.
Tomalak
no, date is in `[]` and not quoted
KARASZI István
Hm, you are right. I did not notice the blank before the `+`. You could live with it - the date would come out as field #4 (`'[25/Jan/2000:14:00:01'`) and the timezone as field #5 (`'+0100]'`). Not 100%, but close enough (and less complex than a regex).
Tomalak
+1  A: 

You should check out a regular expression tutorial. But here is the answer:

if (preg_match('/^(\S+) \S+ \S+ \[(.*?)\] "(\S+).*?" \d+ \d+ "(.*?)" "(.*?)"/', $line, $m)) {
  $ip = $m[1];
  $date = $m[2];
  $method = $m[3];
  $referer = $m[4];
  $browser = $m[5];
}

Take care, it's not the domain name in the log but the HTTP referer.

KARASZI István
+2  A: 

@OP, you can create your own parser, but you might want to know there are already utilities for that. See here

ghostdog74
+3  A: 

This log format seems to be the Apache’s combined log format. Try this regular expression:

/^(\S+) \S+ \S+ \[([^\]]+] "([A-Z]+)[^"]*" \d+ \d+ "[^"]*" "([^"]*)"$/m

The matching groups are as follows:

  1. remote IP address
  2. request date
  3. request HTTP method
  4. User-Agent value

But the domain is not listed there. The second quoted string is the Referer value.

Gumbo
I also need to split them
streetparade
@streetparade: Use `preg_match_all` and you get all matches: `preg_match_all('…', $str, $matches)`
Gumbo