views:

652

answers:

3

I have below a list of text, it is from a popular online game called EVE Online and this basically gets mailed to you when you kill a person in-game. I'm building a tool to parse these using PHP to extract all relevant information. I will need all pieces of information shown and i'm writting classes to nicely break it into relevant encapsulated data.

2008.06.19 20:53:00

Victim: Massi
Corp: Cygnus Alpha Syndicate
Alliance: NONE
Faction: NONE
Destroyed: Raven
System: Jan
Security: 0.4
Damage Taken: 48436

Involved parties:

Name: Kale Kold
Security: -10.0
Corp: Vicious Little Killers
Alliance: NONE
Faction: NONE
Ship: Drake
Weapon: Hobgoblin II
Damage Done: 22093

Name: Harulth (laid the final blow)
Security: -10.0
Corp: Vicious Little Killers
Alliance: NONE
Faction: NONE
Ship: Drake
Weapon: Caldari Navy Scourge Heavy Missile
Damage Done: 16687

Name: Gistatis Tribuni / Angel Cartel
Damage Done: 9656

Destroyed items:

Capacitor Power Relay II, Qty: 2
Paradise Cruise Missile, Qty: 23
Cataclysm Cruise Missile, Qty: 12
Small Tractor Beam I
Alloyed Tritanium Bar, Qty: 2 (Cargo)
Paradise Cruise Missile, Qty: 1874 (Cargo)
Contaminated Nanite Compound (Cargo)
Capacitor Control Circuit I, Qty: 3
Ballistic Deflection Field I
'Malkuth' Cruise Launcher I, Qty: 3
Angel Electrum Tag, Qty: 2 (Cargo)

Dropped items:

Ballistic Control System I
Shield Boost Amplifier I, Qty: 2
Charred Micro Circuit, Qty: 4 (Cargo)
Capacitor Power Relay II, Qty: 2
Paradise Cruise Missile, Qty: 10
Cataclysm Cruise Missile, Qty: 21
X-Large Shield Booster II
Cataclysm Cruise Missile, Qty: 3220 (Cargo)
Fried Interface Circuit (Cargo)
F-S15 Braced Deflection Shield Matrix, Qty: 2
Salvager I
'Arbalest' Cruise Launcher I
'Malkuth' Cruise Launcher I, Qty: 2

I'm thinking about using regular expressions to parse the data but how would you approach this? Would you collapse the mail into a one line string or parse each line from an array? The trouble is there are a few anomalies to account for.

First, the 'Involved parties:' section is dynamic and can contain lots of people all with the similar structure as below but if a computer controlled enemy takes a shot at the victim too, it gets shortened to only the 'Name' and 'Damage Done' fields, as shown above (Gistatis Tribuni / Angel Cartel).

Second, the 'Destroyed' and 'Dropped' items are dynamic and will be different lengths on each mail and i will also need to get the quantity and wether or not they are in cargo.

Ideas for an approach are welcome.

+11  A: 
Paul Dixon
i can't compete with a picture, +1 for breaking the div too :)
Owen
Definitly the most professional answer. But I am not sure that it will help him a lot. It's for EVE online, not for an ADA ouput parser...
e-satis
(anyway I voted +1 for the wonderful pic :-))
e-satis
+1  A: 

You might be interested in http://pear.php.net/package/PHP_LexerGenerator

(Yes, it's alpha. Yes, I haven't used it myself. Yes, you need to know/learn the lexer syntax. Why do I suggest it? Just curious what your experience with it would be ;-))

VolkerK
+1. I wrote a code example for the naive state machine but I do believe that a lexer generator would do the job much more efficiently if the format become more complex.
e-satis
I write "naive" parsers again and again and afterwards think "hm, could have done this in half the time if only I'd _like_ lex,flex, et al just a litte bit more".
VolkerK
+3  A: 

If you want something flexible, use the state machine approach.

If you want something quick and dirty, use regexp.

For the first solution, you can use libraries that are specialized in parsin since it's not a trivial task. But because it's fairly simple format, you can hack a naive parser, as for example :

<?php

class Parser 
{
   /* Enclosing the parser in a class is not mandatory but it' clean */

    function Parser()
    {

        /* data holder */
        $this->date = '';
        $this->parties = array();
        $this->victim = array();
        $this->items = array("Destroyed" => array(),
                                            "Dropped" => array());

        /* Map you states on actions. Sub states can be necessary (and sub parsers too :-) */                   
        $this->states = array('Victim' => 'victim_parsing',
                                             'Involved' => 'parties_parsing' ,
                                             'items:' => "item_parsing");


        $this->state = 'start';                      
        $this->item_parsing_state = 'Destroyed';     
        $this->partie_parsing_state = '';           
        $this->parse_tools = array('start' => 'start_parsing',
                                           'parties_parsing' =>'parties_parsing',
                                           'item_parsing' => 'item_parsing',
                                           'victim_parsing' => 'victim_parsing');


    }

    /* the magic job is done here */

    function checkLine($line) 
    {
        foreach ($this->states as $keyword => $state) 
            if (strpos($line, $keyword) !== False)
                    $this->state = $this->states[$keyword];

        return trim($line);
    }

    function parse($file)
    {
        $this->file = new SplFileObject($file);
        foreach ($this->file as $line) 
            if ($line = $this->checkLine($line))
                 $this->{$this->parse_tools[$this->state]}($line);
    }


    /* then here you can define as much as parsing rules as you want */

    function victim_parsing($line) 
    {
        $victim_caract = explode(': ', $line);
        $this->victim[$victim_caract[0]] = $victim_caract[1];
    }

    function start_parsing($line)
    {
        $this->date = $line;
    }

    function item_parsing($line) 
    {
        if (strpos($line, 'items:') !== False)
        {
            $item_state = explode(' ', $line);
            $this->item_parsing_state = $item_state[0];
        }   
          else 
         {
               $item_caract = explode(', Qty: ', $line);
               $this->items[$this->item_parsing_state][$item_caract[0]] = array();
               $item_infos =  explode(' ', $item_caract[1]);
               $this->items[$this->item_parsing_state][$item_caract[0]] ['qty'] = empty($item_infos[0]) ? 1 : $item_infos[0];
               $this->items[$this->item_parsing_state][$item_caract[0]] ['cargo'] = !empty( $item_infos[1]) ? "True":  "False";
               if  (empty( $this->items[$this->item_parsing_state][$item_caract[0]] ['qty'] ))
                print $line;
         }
    }

    function parties_parsing($line) 
    {        

        $partie_caract = explode(': ', $line);

        if ($partie_caract[0] == "Name")
        {
            $this->partie_parsing_state = $partie_caract[1];
            $this->parties[ $this->partie_parsing_state ] = array();
        }
        else
            $this->parties[ $this->partie_parsing_state ][$partie_caract[0]] = $partie_caract[1];

    }

}

/* a little test */

$parser = new Parser();
$parser->parse('test.txt');

echo "======== Fight report - ".$parser->date." ==========\n\n";
echo "Victim :\n\n";
print_r($parser->victim);
echo "Parties :\n\n";
print_r($parser->parties);
echo "Items: \n\n";
print_r($parser->items);

?>

We can do that because here, reliability and perf are not an issue :-)

Happy game !

e-satis