ansaurus

Question

How can I get page's <title> tag's content if it can't be parsed as XML?

Answer 1

+3 A:

This will echo "The title":

<?php

$doc = <<<HTML
<html>
<head>
<title>The title</title>
<body>
hhhhhh
HTML;

libxml_use_internal_errors(true);
$d = new DOMDocument;
$d->loadHTML($doc);
$ts = $d->getElementsByTagName("title");
if ($ts->length > 0) {
    echo $ts->item(0)->textContent;
}

Artefacto 2010-06-28 19:27:31

It works, but I get a lot of errors, like this: http://www.peeep.us/31a386c6 . Can you help me avoid getting so much errors?

popoffka 2010-06-28 19:36:03

Oops, sorry, wrong link! Here's the right one: http://clip2net.com/clip/m24988/1277753812-clip-102kb.png

popoffka 2010-06-28 19:37:09

@pop See http://www.php.net/manual/en/function.libxml-use-internal-errors.php I added the call to that function in the example.

Artefacto 2010-06-28 19:38:21

Thank you! It works OK now

popoffka 2010-06-28 19:47:13

Answer 2

+1 A:

Or you can use Simple HTML DOM

Sarfraz 2010-06-28 19:32:25

Answer 3

A:

You can use this script to get the title of a page.

# Script Title.txt
var str page, content
cat $page > $content
stex -r -c "^<title&</title&\>^" $content

Save this little code in file C:/Scripts/Title.txt. Code is in biterscripting. Start biterscripting, and enter this command.

script "C:/Scripts/Title.txt" page("http://stackoverflow.com/questions/3135488/how-can-i-get-pages-title-tags-content-if-it-cant-be-parsed-as-xml")

It will get the title of this page (the one you are viewing). Use any other URL or local file path as the value of page(). Use double quotes. When I executed this command, I got

How can I get page's <title> tag's content if it can't be parsed as XML? - Stack Overflow

You can call this script from any executable or batch file.

P M 2010-06-28 19:47:03

Answer 4

A:

Try using Yahoo's YQL console. You can query almost any url and then ask for results back in XML. You can even add xpath to narrow it down.

http://developer.yahoo.com/yql/console/

Maybe you can call this service using curl. It's pretty handy.

misterte 2010-06-29 01:05:23

ansaurus

tags:

views:

answers:

How can I get page's <title> tag's content if it can't be parsed as XML?

related questions