tags:

views:

584

answers:

4

Hi,

I need help figuring out some regular expressions. I'm running the dig command and I need to use its output. I need to parse it and get it neatly arranged as an array using php.

dig outputs something like this:

m0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183342" "Some text here1"
m0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183341" "Some text here2"

I want to get this:

Array
(
    [0] => Array
        (
            [0] => .tkw
            [1] => 1
            [2] => 20090624-183342
            [3] => Some text here1
        )
    [1] => Array
...
)

I just need the contents inside the double quotes. I can parse the dig output line by line, but I think it would be faster if I just run the regex pattern matching on all of it...

Thoughts?

+2  A: 

I'm not sure about PHP regular expressions, but in Perl the RE would be simple:

my $c = 0;
print <<EOF;
Array
(
EOF
foreach (<STDIN>) {
    if (/[^"]*"([^"]*)"\s+"([^"]*)"\s+"([^"]*)"\s+"([^"]*)"/) {
        print <<EOF;
    [$c] => Array
        (
            [0] = $1
            [1] = $2
            [2] = $3
            [3] = $4
        )
EOF
        $c++;
    }
}

print <<EOF;
)
EOF

This has some limitations, namely:

  • It does not work if the text in the quotes can have escaped quotes (e.g. \")
  • It is hard coded to support four quoted values only.
Jamie Love
A: 

Code:

<?php
    $str = 'm0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183342" "Some text here1"
m0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183341" "Some text here2"';

    header('Content-Type: text/plain');
    $matches = array();
    preg_match_all('/(".*").*(".*").*(".*").*(".*")/U', $str, $matches, PREG_SET_ORDER);
    print_r($matches);
?>

Output:

Array
(
    [0] => Array
        (
            [0] => ".tkw" "1" "20090624-183342" "Some text here1"
            [1] => ".tkw"
            [2] => "1"
            [3] => "20090624-183342"
            [4] => "Some text here1"
        )

    [1] => Array
        (
            [0] => ".tkw" "1" "20090624-183341" "Some text here2"
            [1] => ".tkw"
            [2] => "1"
            [3] => "20090624-183341"
            [4] => "Some text here2"
        )

)
John Kugelman
Hi John! I based my final solution upon yours and Peter's (below). A combination of both worked out for me. Appreciate your quick turnaround and attention to detail!!
Steve
A: 

This gets close with a single line

preg_match_all( '/"([^"]+)"\s*"([^"]+)"\s*"([^"]+)"\s*"([^"]+)"/', $text, $matches, PREG_SET_ORDER );

print_r( $matches );

however, becuase of how the preg_match* functions work, the full match is included at index 0 of each match group. You could fix this if you really wanted to.

array_walk( $matches, create_function( '&$array', 'array_shift( $array );return $array;' ) );
Peter Bailey
Thanks Peter - your solution got rid of the double quotes. This is what I needed!
Steve
A: 

Totally not what you asked for, but it does work, could be used for strings with any number of quotes, and has the benefit of being more readable than your average regular expression (at the expense of way more code)

class GetQuotedText { 
 const STATE_OUTSIDE = 'STATE_OUTSIDE';
 const STATE_INSIDE  = 'STATE_INSIDE';

 static private $input;
 static private $counter;
 static private $state;
 static private $results;

 static private $current;
 static private $full;
 static private $all;

 static private function setInput($string) {
  $this->input = $string;

 }

 static private function init($string) {
  self::$current  = array();
  self::$full   = array();  
  self::$input  = $string;
  self::$state  = self::STATE_OUTSIDE;
 }


 static public function getStrings($string) {
  self::init($string);
  for(self::$counter=0;self::$counter<strlen(self::$input);self::$counter++){
   self::parse(self::$input[self::$counter]);
  }
  self::saveLine();
  return self::$all;
 }

 static private function parse($char) {
  switch($char){
   case '"':
    self::encounteredToken($char);
    break;  
   case "\n": //deliberate fall through for "\n" and "\r"
   case "\r":
    self::encounteredToken($char);
    break;
   default:
    if(self::$state == self::STATE_INSIDE) {
     self::action($char);
    }
  }
 }

 static private function encounteredToken($token) {
  switch($token) {
   case '"':
    self::swapState();
    break;
   case "\n": //deliberate fall through for "\n" and "\r"
   case "\r":
    self::saveArray();
    self::saveLine();
    break;
  }
  return;
 }

 static private function swapState() {
  if(self::$state == self::STATE_OUTSIDE) {
   self::$state = self::STATE_INSIDE;
  }
  else {
   self::$state = self::STATE_OUTSIDE;    
   self::saveArray();
  }    
 }
 static public function saveLine() {
  self::$all[] = self::$full;
  self::$full = array();
  //reset state when line ends
  self::$state = self::STATE_OUTSIDE;
 }

 static private function saveArray() {
  if(count(self::$current) > 0) {
   self::$full[]  = implode ('',self::$current);
   self::$current  = array();
  }
 }

 static private function action($char) {
  self::$current[] = $char;
 }
}

$input = 'm0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183342" "Some text here1"' . "\n" .
   'm0.ttw.mydomain.tel.    60      IN      TXT     ".tkw" "1" "20090624-183341" "Some text here2"';
$strings = GetQuotedText::getStrings($input);
print_r($strings);
Alan Storm