views:

27

answers:

1

I got a custom log, roughly 29MBs of user's data, including user agent. I want to parse through it (essentially just search) and find how many occurances of say, "Firefox" or "MSIE" appear in it, like a mini log parser.

This is where I am stumped.. What I was getting at is explode()ing newlines, and iterate through the array, use:

if stripos($line, 'Firefox') $ff++;" 

or something stupid but I realize that would take up a lot of memory/use a lot of functions.

What would be a good way to list number of occurances?

+5  A: 

You'll want to read the file line by line to avoid using up memory with lots of data.

$count = array('Firefox' => 0, 'MSIE' => 0, 'Others' => 0);
$handle = fopen("yourfile", "r");

if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle, 4096);

        // actual counting here:
        if (stripos($buffer, 'Firefox')) {
            $count['Firefox']++;
        } else if (stripos($buffer, 'MSIE')) {
            $count['MSIE']++;

        // this might be irrelevant if not all your lines contain user-agent
        // strings, but is here to show the idea
        } else {
            $count['Others']++; 
        }
    }
    fclose($handle);
}

print_r($count);

Also depending on the format of your file (which wasn't supplied), you might want to use regex or a more refined method to count occurences, eg:

$count = array('Firefox' => 0, 'MSIE' => 0, 'Others' => 0);
$handle = fopen("yourfile", "r");

if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle, 4096);
        $ua = get_user_agent($buffer);  
        $count[$ua]++;
    }
    fclose($handle);
}

print_r($count);

/* @param $line
 * @return string representing the user-agent
 *
 * strpos() works for the most part, but you can use something more 
 * accurate if you want
 */
function get_user_agent($line) {
    // implementation left as an exercise to the reader
}
NullUserException
The format is simply "[ID] [IP] [UA] [DATE]" more or less, so i'll enter as many UA types as I can, "other" will simply be useless bots. Thanks!
John
We should begin using "X left as an exercise to the reader" in our answers more :D
BoltClock
@NullUserException: hence my comment. \*cackle\*
BoltClock
You can use http://www.php.net/manual/en/function.sscanf.php to parse your line log
Felipe Cardoso Martins
Perfect, this is actually a really useful exercise, I really appreciate the answer. Accepted.
John