Hay, i can't seem to find any regular expressions online to remove
<h1></h1>
tags (and their content).
Anyone lend a hand on this and help.
Hay, i can't seem to find any regular expressions online to remove
<h1></h1>
tags (and their content).
Anyone lend a hand on this and help.
You cannot find one, because there is none.
Regular expressions are not a good fit for this task, since the <h1>
tags may be nested arbitrarily deep. (Edit: Tomalak pointed out that they are not allowed to, but reality is evil). Try a HTML parser instead.
Turbod's expression will work, if you can be sure that nowhere in your document can be a construct like <h1>Foo <h1> Bar</h1></h1>
.
Edit:
Depending on your scenario, a css style like h1 { display: none !important; }
might do the trick.
Don't use a regex, use a tool like PHP Simple HTML DOM.
// Construct dom from string
$dom = str_get_html($html);
// ...or construct dom from file/url
$dom = file_get_html($path);
// strip h1 tags (and their content)
foreach ($dom->find('h1') as $node) {
$node->outertext = '';
}
You can also use PHP's DOM extension module:
$domDocument = new DOMDocument;
$domDocument->loadHTMLFile('http://example.com');
$domNodeList = $domDocument->getElementsByTagname('h1');
$domElemsToRemove = array();
foreach ($domNodeList as $domElement) {
$domElemsToRemove[] = $domElement;
}
foreach($domElemsToRemove as $domElement) {
$domElement->parentNode->removeChild($domElement);
}
var_dump($domDocument->saveHTML());