views:

268

answers:

2

$oldSetting = libxml_use_internal_errors( true );

libxml_clear_errors(); I have seen many examples on the web on how to extract the URLs from HTML with PHP 5's DOM functions, but I need to get the link text as well as the link. If I use the code below to extract the link "http//X.com" from the "href" attribute in the anchor tag YYYYY, how do I get the corresponding "YYYYY" associated with it?

$html = new DOMDocument();

$html->loadHtmlFile($location);

$xpath = new DOMXPath($html);

$links = $xpath->query( '//a' );

foreach ( $links as $link ) { $url_list[$i++] = $link->getAttribute( 'href' ) . "\n"; }

libxml_clear_errors();

libxml_use_internal_errors( $oldSetting );

A: 

You're trying to get cdata from an xml element. Here's a similar question: http://stackoverflow.com/questions/120016/php-simplexml-problem

camomileCase
A: 

DOMDocument() is slow as hell. Try preg_match() or xml_parse_into_struct() instead.

Havenard