tags:

views:

28

answers:

2

Hi there,

I'd like to get the title tag and RSS feed address (if there is one) from a given URL, but the method(s) I've used so far just aren't working at all. I've managed to get the title tag by using preg_match and a regular expression, but I can't seem to get anywhere with getting the RSS feed address.

($webContent holds the HTML of the website)

I've copied my code below for reference...

` // Get the title tag preg_match('@(.*)@i',$webContent,$titleTagArray);

// If the title tag has been found, assign it to a variable
if($titleTagArray && $titleTagArray[3])
 $webTitle = $titleTagArray[3];

// Get the RSS or Atom feed address
preg_match('@<link(.*)rel="alternate"(.*)href="(.*)"(.*)type="application/rss+xml"\s/>@i',$webContent,$feedAddrArray);

// If the feed address has been found, assign it to a variable
if($feedAddrArray && $feedAddrArray[2])
 $webFeedAddr = $feedAddrArray[2];`

I've been reading on here that using a regular expression isn't the best way to do this? Hopefully someone can give me a hand with this :-)

Thanks.

A: 

RegExp is far away from the best solution ;) Use a feed reader, the Zend_Feed class of the zend framework for example.

Tobias P.
Good pick if he was parsing an RSS Feed. He's parsing an HTML page though.
Gordon
+4  A: 

One approach

$dom = new DOMDocument;            // init new DOMDocument
$dom->loadHTML($html);             // load HTML into it
$xpath = new DOMXPath($dom);       // create a new XPath

$nodes = $xpath->query('//title'); // Find all title elements in document
foreach($nodes as $node) {         // Iterate over found elements
    echo $node->nodeValue;         // output title text
}

To get the href attribute of all link tags with a type of "application/rss+xml" you would use this XPath:

$xpath->query('//link[@type="application/rss+xml"]/@href');
Gordon
For a wider range of feed types, you could use something like: `/html/head/link[@rel="alternate" and @href and (@type="application/atom+xml" or @type="application/rss+xml" or @type="application/rdf+xml")]/@href` —— regex would be nice, but `or` will suffice
salathe