tags:

views:

33

answers:

2

I'm parsing out an HTML table and building an array based on the row values. My problem is the associative keys that are returned have a bit of white space at the end of them giving me results like this:

Array ( [Count  ] => 6   [Class  ] => 30c   [Description] => Conformation Model (Combined 30,57) )

So a line like this:

echo $myArray['Count'];

or

echo $myArray['Count '];

Gives me a blank result.

for now I've got a pretty hacky work around going...

foreach($myArray as $row){

    $count = 0;
    foreach($row as $info){
        if($count == 0){
            echo 'Count:' . $info;
            echo '<br>';
        }
        if($count == 1){
            echo ' Class:' . $info;
            echo '<br>';
        }
        if($count == 2){
            echo ' Description:' . $info;
            echo '<br>';
        }
        $count++;
    }

}

The function I'm using to parse the table I found here:

function parseTable($html)
{
  // Find the table
  preg_match("/<table.*?>.*?<\/[\s]*table>/s", $html, $table_html);

  // Get title for each row
  preg_match_all("/<th.*?>(.*?)<\/[\s]*th>/", $table_html[0], $matches);
  $row_headers = $matches[1];

  // Iterate each row
  preg_match_all("/<tr.*?>(.*?)<\/[\s]*tr>/s", $table_html[0], $matches);

  $table = array();

  foreach($matches[1] as $row_html)
  {
    preg_match_all("/<td.*?>(.*?)<\/[\s]*td>/", $row_html, $td_matches);
    $row = array();
    for($i=0; $i<count($td_matches[1]); $i++)
    {
      $td = strip_tags(html_entity_decode($td_matches[1][$i]));
      $row[$row_headers[$i]] = $td;
    }

    if(count($row) > 0)
      $table[] = $row;
  }
  return $table;
}

I'm assuming I can eliminate the white space by updating with the correct regex expression, but, of course I avoid regex like the plague. Any ideas? Thanks in advance. -J

+4  A: 

You can use trim to remove leading and trailing whitespace characters:

$row[trim($row_headers[$i])] = $td;

But don’t use regular expressions for parsing the HTML document; use a proper HTML parser like the Simple HTML DOM Parser or the one of DOMDocument instead.

Gumbo
+1  A: 

An easy solution would be to change

$row[$row_headers[$i]] = $td;

to:

$row[trim($row_headers[$i])] = $td;
Daniel Egeberg