views:

66

answers:

4

I'm using PHP libcurl to load a page. Now I need to get this page's <title> tag's content, and some other information too. I've tried to parse it using SimpleXML, but with no luck, because the page isn't valid XML. Can you suggest some other way to easily get contents of <title> tag? Thank you.

+3  A: 

You can use DOMDocument::loadHTML.

This will echo "The title":

<?php

$doc = <<<HTML
<html>
<head>
<title>The title</title>
<body>
hhhhhh
HTML;

libxml_use_internal_errors(true);
$d = new DOMDocument;
$d->loadHTML($doc);
$ts = $d->getElementsByTagName("title");
if ($ts->length > 0) {
    echo $ts->item(0)->textContent;
}
Artefacto
It works, but I get a lot of errors, like this: http://www.peeep.us/31a386c6 . Can you help me avoid getting so much errors?
popoffka
Oops, sorry, wrong link! Here's the right one: http://clip2net.com/clip/m24988/1277753812-clip-102kb.png
popoffka
@pop See http://www.php.net/manual/en/function.libxml-use-internal-errors.php I added the call to that function in the example.
Artefacto
Thank you! It works OK now
popoffka
+1  A: 

Or you can use Simple HTML DOM

Sarfraz
A: 

You can use this script to get the title of a page.

# Script Title.txt
var str page, content
cat $page > $content
stex -r -c "^<title&</title&\>^" $content

Save this little code in file C:/Scripts/Title.txt. Code is in biterscripting. Start biterscripting, and enter this command.

script "C:/Scripts/Title.txt" page("http://stackoverflow.com/questions/3135488/how-can-i-get-pages-title-tags-content-if-it-cant-be-parsed-as-xml")

It will get the title of this page (the one you are viewing). Use any other URL or local file path as the value of page(). Use double quotes. When I executed this command, I got

How can I get page's <title> tag's content if it can't be parsed as XML? - Stack Overflow

You can call this script from any executable or batch file.

P M
A: 

Try using Yahoo's YQL console. You can query almost any url and then ask for results back in XML. You can even add xpath to narrow it down.

http://developer.yahoo.com/yql/console/

Maybe you can call this service using curl. It's pretty handy.

misterte