ansaurus

Question

How can I find the contents of the first h3 tag?

Answer 1

+1 A:

Well, a simple solution would be the following:

preg_match( '#<h3[^>]*>(.*?)</h3>#i', $text, $match );
echo $match[1];

For everything more complex, you should consider using a HTML document parser though.

poke 2010-10-04 14:13:14

Answer 2

A:

preg_match("/&lt;h3&gt;(.*)&lt;\/h3&gt;/", $search_in_this_string, $put_matches_in_this_var);

Ashwini Dhekane 2010-10-04 14:13:38

Expression here is incorrect (and using regex in general a bad idea)

Peter Boughton 2010-10-04 14:16:50

Answer 3

A:

First of all: regular expressions aren't a proper tool for parsing HTML code. However in this case, they should be good enough, cause H3 tags cannot be nested.

preg_match_all('/<h3[^>]*>(.*?)<\/h3>/si', $source, $matches);

$matches variable should contains content from H3 tagas.

Crozin 2010-10-04 14:14:35

But they can be commented out, or contains the code `<h3 title="Wibble>Wobble">Wibble > Wobble</h3>`, or similar.

Peter Boughton 2010-10-04 14:16:00

Answer 4

A:

PHP has the ability to parse HTML DOMs natively - you almost certainly want to use that instead of regex.

See this page for details: http://php.net/manual/en/book.dom.php

And check the related questions down the right hand side for people asking very similar questions.

Peter Boughton 2010-10-04 14:14:37

Answer 5

+1 A:

Here's an explanation why parsing HTML with regular expressions is evil. Anyway, this is a way to do it...

$doc = new DOMDocument();
$doc->loadHTML($text);
$headings = $doc->getElementsByTagName('h3');
$heading = $headings->item(0);
$heading_value = (isset($heading->nodeValue)) ? $heading->nodeValue : 'Header not found';

Roberto Aloi 2010-10-04 14:17:00

Answer 6

+3 A:

You should use php's DOM parser instead of regular expressions. You're looking for something like this (untested code warning):

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($html_content);
libxml_use_internal_errors(false);

$domx = new DOMXPath($domd);
$items = $domx->query("//h3[position() = 1]");

echo $items->item(0)->textContent;

Maerlyn 2010-10-04 14:17:31

Nicest way IMHO.

Álvaro G. Vicario 2010-10-04 14:20:02

Surely more elegant than my attempt :)

Roberto Aloi 2010-10-04 14:22:58

Answer 7

+1 A:

The DOM approach:

<?php

$html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"&gt;
<html>
<head><title></title>
</head>
<body>

<h1>Lorem ipsum<h1>
<h2>Dolor sit amet<h2>
<h3>Duis quis velit est<h3>
<p>Cras non tempor est.</p>
<p>Maecenas nec libero leo.</p>
<h3>Nulla eu ligula est</h3>
<p>Suspendisse potenti.</p>

</body>
</html>
';

$doc = new DOMDocument;
$doc->loadHTML($html);

$titles = $doc->getElementsByTagName('h3');
if( !is_null($titles->item(0)) ){
    echo $titles->item(0)->nodeValue;
}

?>

Álvaro G. Vicario 2010-10-04 14:18:27

Answer 8

+1 A:

Use an xpath expression like

"/html/body/h3[0]"

this will select the whole first h3 node.

Note that this will not work on ill-formed html.

codymanix 2010-10-04 14:20:48

With DOM's loadHTML(), this will work fine with real world (read broken) HTML.

Gordon 2010-10-04 14:40:05

ansaurus

tags:

views:

answers:

How can I find the contents of the first h3 tag?

related questions