tags:

views:

315

answers:

5

Hay, i can't seem to find any regular expressions online to remove

<h1></h1>

tags (and their content).

Anyone lend a hand on this and help.

A: 
preg_replace('@<h1[^>]*?>.*?<\/h1>@si', '', $htmlsource);
turbod
This solution worked as i wanted it to. Thanks turbod.
dotty
for your one case but not, I suspect, in all instance. use with caution.
Sam Holder
+2  A: 

You cannot find one, because there is none.

Regular expressions are not a good fit for this task, since the <h1> tags may be nested arbitrarily deep. (Edit: Tomalak pointed out that they are not allowed to, but reality is evil). Try a HTML parser instead.

Turbod's expression will work, if you can be sure that nowhere in your document can be a construct like <h1>Foo <h1> Bar</h1></h1>.

Edit: Depending on your scenario, a css style like h1 { display: none !important; } might do the trick.

Jens
`<h1>` tags may be nested arbitrarily deep? Not when I last looked a the spec. ;-) The real problem is that the HTML might be horribly broken and a regex has no way of recognizing.
Tomalak
@Tomalak: *cough* I never looked at the spec. =) But unfortunately, I am not alone there, and the tags will be nested anyways. The spec allows comments that may contain `<h1>` tags, though.
Jens
+1  A: 

Why not use strip_tags?

Sres
*(and their content)*.
KennyTM
+5  A: 

Don't use a regex, use a tool like PHP Simple HTML DOM.

// Construct dom from string
$dom = str_get_html($html);

// ...or construct dom from file/url
$dom = file_get_html($path);

// strip h1 tags (and their content)
foreach ($dom->find('h1') as $node) {
    $node->outertext = '';
}
nikc
Why use an external class when PHP provides the DOMDocument class
AntonioCS
I said "a tool *like* PHP Simple HTML DOM", not "You absolutely should use PHP Simple HTML DOM". I chose to demo it, because it's a tool I'm personally familiar with, like using for it's simplicity, and am able to provide an example for.
nikc
I find DOMDocument a bit complicated.
Ben Shelock
+2  A: 

You can also use PHP's DOM extension module:

$domDocument = new DOMDocument;
$domDocument->loadHTMLFile('http://example.com');
$domNodeList = $domDocument->getElementsByTagname('h1');
$domElemsToRemove = array();
foreach ($domNodeList as $domElement) {
    $domElemsToRemove[] = $domElement;
}
foreach($domElemsToRemove as $domElement) {
    $domElement->parentNode->removeChild($domElement);
}
var_dump($domDocument->saveHTML());
karim79