I'm doing a bookmarking-system and looking for the fastest (easiest) way to retrive a page 's title with PHP.
It would be nice to have something like $title = page_title($url)
Thanks in advance! =)
I'm doing a bookmarking-system and looking for the fastest (easiest) way to retrive a page 's title with PHP.
It would be nice to have something like $title = page_title($url)
Thanks in advance! =)
Regex?
Use cURL to get the $htmlSource variable's contents.
preg_match('/<title>(.*)<\/title>/iU', $htmlSource, $titleMatches);
print_r($titleMatches);
see what you have in that array.
Most people say for HTML traversing though you should use a parser as regexs can be unreliable.
The other answers provide more detail :)
<?php
function page_title($url) {
$fp = file_get_contents($url);
if (!$fp)
return null;
$res = preg_match("/<title>(.*)<\/title>/", $fp, $title_matches);
if (!$res)
return null;
$title = $title_matches[1];
return $title;
}
?>
Gave 'er a whirl on the following input:
print page_title("http://www.google.com/");
Outputted: Google
Hopefully general enough for your usage. If you need something more powerful, it might not hurt to invest a bit of time into researching HTML parsers.
EDIT: Added a bit of error checking. Kind of rushed the first version out, sorry.
or making this simple function slightly more bullet proof:
function page_title($url) {
$page = @file_get_contents($url);
if (!$page) return null;
$matches = array();
if (preg_match('/<title>(.*?)<\/title>/', $page, $matches)) {
return $matches[1];
}
else {
return null;
}
}
echo page_title('http://google.com');
I like using SimpleXml with regex's, this is from a solution I use to grab multiple link headers from a page in an OpenID library I've created. I've adapted it to work with the title (even though there is usually only one).
function getTitle($sFile)
{
$sData = file_get_contents($sFile);
if(preg_match('/<head.[^>]*>.*<\/head>/is', $sData, $aHead))
{
$sDataHtml = preg_replace('/<(.[^>]*)>/i', strtolower('<$1>'), $aHead[0]);
$xTitle = simplexml_import_dom(DomDocument::LoadHtml($sDataHtml));
return (string)$xTitle->head->title;
}
return null;
}
echo getTitle('http://stackoverflow.com/questions/399332/fastest-way-to-retrieve-a-title-in-php');
Ironically this page has a "title tag" in the title tag which is what sometime causes problems with the pure regex solutions.
This solution is not perfect as it lowercase's the tags which could cause a problem for the nested tag if formatting/case was important (such as XML), but there are ways that are a bit more involved around that problem.