views:

91

answers:

6

are there build in functions in latest versions of php specially designed to aid in this task ?

A: 

you could use the explode-function to turn the table cols and rows into arrays.

see: php explode

Oliver
explode() won't help to split a HTML table structure, will it?
Pekka
you would have to take care of the closing tags, but splitting the table like: $rows = explode("<tr>", $tablehtml); could be a possibility. But I agree with you and @amora traversing the dom seems to be a better way.
Oliver
A: 

i dont know if this is the faster , but you can check this class (using preg_replace)

http://wonshik.com/snippet/Convert-HTML-Table-into-a-PHP-Array

Haim Evgi
A: 

If you want to convert the html-description of a table, here's how I would do it:

You have to work out the details on your own, since I do not know if you want to handle different lines as subarrays or you want to merge all lines into one big array or something else.

phimuemue
+5  A: 

Use a DOM parser like SimpleXML to split the HTML code into nodes, and walk through the nodes to build the array.

For broken/invalid HTML, SimpleHTMLDOM is more lenient (but it's not built in).

Pekka
Obligatory suggested third party alternatives to SimpleHtmlDom that actually use [DOM](http://php.net/manual/en/book.dom.php) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org).
Gordon
+1  A: 

String replace and explode would work if the HTML code is clean and always the same, as soon as you have new attributes it will brake. So only dependable solution would be using regular expressions or XML/HTML parser. Check http://php.net/manual/en/book.dom.php

aromawebdesign.com
[Regex are not dependable when parsing HTML with it, because HTML is not regular.](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html)
Gordon
@Gordon, preg_match_all('/<tr>\s*<td[^>]*>((?:<td.+?<\/td|.)*?)<\/td>/si', $html, $matches);where is the problem?
aromawebdesign.com
@aromawebdesign how about: it's not parsing?
Gordon
@Gordon, I understand where you coming from, though HTML tables are quite regular unlike rest of the HTML, since question where regarding tables I am quite confident in my solution.
aromawebdesign.com
A: 

An alternative to using a native DOM parser could be using YQL. This way you dont have to do the actual parsing yourself. The YQL Web Service enables applications to query, filter, and combine data from different sources across the Internet.

For instance, to grab the HTML table with the class example given at

http://www.w3schools.com/html/html_tables.asp

you can do

$yql = 'http://tinyurl.com/yql-table-grab';
$yql = json_decode(file_get_contents($yql));
print_r( $yql->query->results );

I've deliberated shortened the URL so it does not mess up the answer. $yql actually links to the YQL API, adds some options and contains the query:

select * from html 
    where xpath="//table[@class='example']" 
    and url="http://www.w3schools.com/html/html_tables.asp"

YQL can return JSON and XML. I've made it return JSON and decoded this then, which then results in a nested structure of stdClass objects and Arrays (so it's not all arrays). You have to see if that fits your needs.

You try out the interactive YQL console to see how it works.

Gordon