Hi,
I am looking for a regex to find the contents of the first <h3>
tag. What can I use there?
Hi,
I am looking for a regex to find the contents of the first <h3>
tag. What can I use there?
Well, a simple solution would be the following:
preg_match( '#<h3[^>]*>(.*?)</h3>#i', $text, $match );
echo $match[1];
For everything more complex, you should consider using a HTML document parser though.
preg_match("/<h3>(.*)<\/h3>/", $search_in_this_string, $put_matches_in_this_var);
First of all: regular expressions aren't a proper tool for parsing HTML code. However in this case, they should be good enough, cause H3
tags cannot be nested.
preg_match_all('/<h3[^>]*>(.*?)<\/h3>/si', $source, $matches);
$matches
variable should contains content from H3
tagas.
PHP has the ability to parse HTML DOMs natively - you almost certainly want to use that instead of regex.
See this page for details: http://php.net/manual/en/book.dom.php
And check the related questions down the right hand side for people asking very similar questions.
Here's an explanation why parsing HTML with regular expressions is evil. Anyway, this is a way to do it...
$doc = new DOMDocument();
$doc->loadHTML($text);
$headings = $doc->getElementsByTagName('h3');
$heading = $headings->item(0);
$heading_value = (isset($heading->nodeValue)) ? $heading->nodeValue : 'Header not found';
You should use php's DOM parser instead of regular expressions. You're looking for something like this (untested code warning):
$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($html_content);
libxml_use_internal_errors(false);
$domx = new DOMXPath($domd);
$items = $domx->query("//h3[position() = 1]");
echo $items->item(0)->textContent;
The DOM approach:
<?php
$html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><title></title>
</head>
<body>
<h1>Lorem ipsum<h1>
<h2>Dolor sit amet<h2>
<h3>Duis quis velit est<h3>
<p>Cras non tempor est.</p>
<p>Maecenas nec libero leo.</p>
<h3>Nulla eu ligula est</h3>
<p>Suspendisse potenti.</p>
</body>
</html>
';
$doc = new DOMDocument;
$doc->loadHTML($html);
$titles = $doc->getElementsByTagName('h3');
if( !is_null($titles->item(0)) ){
echo $titles->item(0)->nodeValue;
}
?>