ansaurus

Question

Can't separate cells properly with simplehtmldom

Answer 1

A:

You'll get the first td like this:

$firstTD = $row->first_child();

After that you can get the subsequent ones with:

$firstTD->next_sibling()

Wouter van Nifterick 2009-07-26 01:50:00

Fatal error: Call to undefined method simple_html_dom_node::child_nodes() in /var/www/php.php on line 37

2009-07-26 03:00:12

Fatal error: Call to undefined method simple_html_dom_node::domnode_next_sibling() in /var/www/php.php on line 37

2009-07-26 03:22:10

sorry.. it's `$firstTD->next_sibling();`

Wouter van Nifterick 2009-07-26 03:59:38

I still getting the same problem with that code. It just mashes all the siblings up into one field. It is not seperating the `<td>` tags

2009-07-26 04:07:14

Answer 2

+2 A:

You will not like my answer.

Unfortunately, it seems that mismatched closing tags in the HTML you are parsing are confusing Simple_HTML_DOM. Take a look at this snippet:

<td align=center><a href="odds?mting=BR02000"><b><font color=black>2</b></font></a></td>

If you follow the order of tags of this snippet:

<td> is opened
<a> is opened
 is opened
 is opened

Technically, tags should be closed in the opposite order, but this is how they are closed:

 is closed
 is closed
</a> is closed
</td> is closed

The HTML you are trying to scarp is full of those mistakes, all well as closing tags for tags which are never opened. Simple_HTML_DOM doesn't parse those files properly.

I'm afraid that if you don't have the possibility of modifying the HTML, you'll have to parse the file manually, correcting any errors.

As a note, I've tested your code against the following corrected HTML, and Simple_HTML_DOM parsed it successfully, and your code worked just fine.

<tr valign=top>
<td colspan=16 bgcolor=#999999><b>THOROUGHBRED MEETINGS</b></td>

</tr>
<tr valign=top bgcolor="#ffffff">
<td><b>BR</b> <a href="meeting?mtg=br&day=today&curtype=0">SUNSHINE COAST</a></td>
<td><b>FINE/DEAD</b></td>
<td><font color=#cc0000><b>R1</font></b>@<b>12:30pm</b></td>
<td align=center bgcolor=#cc0000><a href="odds?mting=BR01000"><b><font color=#ffffff>1</a></b></font></td>
<td align=center><a href="odds?mting=BR02000"><b><font color=black>2</font></b></a></td>
<td align=center><a href="odds?mting=BR03000"><b><font color=black>3</font></b></a></td>

<td align=center><a href="odds?mting=BR04000"><b><font color=black>4</font></b></a></td>
<td align=center><a href="odds?mting=BR05000"><b><font color=black>5</font></b></a></td>
<td align=center><a href="odds?mting=BR06000"><b><font color=black>6</font></b></a></td>
<td align=center><a href="odds?mting=BR07000"><b><font color=black>7</font></b></a></td>
<td align=center><a href="odds?mting=BR08000"><b><font color=black>8</font></b></a></td>
<td bgcolor="#ffffff" colspan=4> </td>
</tr>

Edit: As an alternative, you might want to try if DOMDocument::loadHTML has better results. It is available in PHP 5 without external libraries. Check the official documentation.

Andrew Moore 2009-07-26 04:18:36

How do I parse the file manually?

2009-07-26 04:26:17

Proper HTML parsing is a rather complicated subject. I'm afraid I can't help you with that.

Andrew Moore 2009-07-26 04:29:16

I added another alternative.

Andrew Moore 2009-07-26 04:36:37

1+ for spotting the invalid html. I didn't notice that. Glen, I think you should either accept the fact that invalid syntax just cannot be parsed properly. Or if you really need to parse this page, just hardcode something. If you first remove all and tags, you should be able to parse the remainder.

Wouter van Nifterick 2009-07-26 05:29:39

**@Wouter van Nifterick:** Should... We don't know the rest of the page and how it might affect parsing. But for this snippet, it is a viable solution.

Andrew Moore 2009-07-26 05:38:35

Answer 3

A:

I got it to work by putting into a DOMDocument() to correct the malformed HTML.

$url = "http://www.acttab.com.au/interbet/venues?day=today";

$doc = new DOMDocument();
$doc->loadHTMLFile($url);

//convert $doc to html
$html = str_get_html($doc->saveHTML());

2009-07-26 10:49:10

ansaurus

tags:

views:

answers:

Can't separate cells properly with simplehtmldom

related questions