tags:

views:

87

answers:

3

Hi!

I've started a little pet project to parse log files for Team Fortress 2. The log files have an event on each line, such as the following:

L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")

Notice there are some common parts of the syntax for log files. Names, for example consist of four parts: the name, an ID, a Steam ID, and the team of the player at the time. Rather than rewriting this type of regular expression, I was hoping to abstract this out slightly.

For example:

my $name = qr/(.*)<(\d+)><(.*)><(Red|Blue)>/
my $kill = qr/"$name" killed "$name"/;

This works nicely, but the regular expression now returns results that depend on the format of $name (breaking the abstraction I'm trying to achieve). The example above would match as:

my ($name_1, $id_1, $steam_1, $team_1, $name_2, $id_2, $steam_2, $team_2)

But I'm really looking for something like:

my ($player1, $player2)

Where $player1 and $player2 would be tuples of the previous data. I figure the "killed" event doesn't need to know exactly about the player, as long as it has information to create the player, which is what these tuples provide.

Sorry if this is a bit of a ramble, but hopefully you can provide some advice!

+4  A: 

I think I understand what you are asking. What you need to do is reverse your logic. First you need to regex to split the string into two parts, then you extract your tuples. Then your regex doesn't need to know about the name, and you just have two generic player parsing regexs. Here is an short example:

#!/usr/bin/perl

use strict;
use Data::Dumper;

my $log = 'L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><
Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")';

my ($player1_string, $player2_string) = $log =~ m/(".*") killed (".*?")/;
my @player1 = $player1_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
my @player2 = $player2_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;

print STDERR Dumper(\@player1, \@player2);

Hope this what you were looking for.

dwp
Yea, I considered this but wasn't sure if there was a way to avoid multiple steps. I can live with this though! I may have a look at some other solutions (like BNF) to describe the log entries, but this will certainly let me carry on.Thanks dwp
aCiD2
+1  A: 

Another way to do it, but the same strategy as dwp's answer:

my @players = 
    map { [ /(.*)<(\d+)><(.*)><(Red|Blue)>/ ] }
    $log_text =~ /"([^\"]+)" killed "([^\"]+)"/
;

Your log data contains several items of balanced text (quoted and parenthesized), so you might consider Text::Balanced for parts of this job, or perhaps a parsing approach rather than a direct attack with regex. The latter might be fragile if the player names can contain arbitrary input, for example.

FM
+1  A: 

Consider writing a Regexp::Log subclass.

ysth