Hi everyone, I was just wondering if any one knew a function to remove ALL classes from a string in php.. Basically I only want
<p>
tags rather than
<p class="...">
If that makes sense :)
Hi everyone, I was just wondering if any one knew a function to remove ALL classes from a string in php.. Basically I only want
<p>
tags rather than
<p class="...">
If that makes sense :)
A fairly naive regex will probably work for you
$html=preg_replace('/class=".*?"/', '', $html);
I say naive because it would fail if your body text happened to contain class="something" for some reason!. It could be made a little more robust by looking for class="" inside angled bracketted tags if need be.
I would do something like this on jQuery. Place this in your page header:
$(document).ready(function(){
$(p).each(function(){
$(this).removeAttr("class");
//or $(this).removeclass("className");
})
});
Maybe it's a bit overkill for your need, but, to parse/validate/clean HTML data, the best tool I know is HTML Purifier
It allows you to define which tags, and which attributes, are OK ; and/or which ones are not ; and it gives valid/clean (X)HTML as output.
(Using regexes to "parse" HTML seems OK at the beginning... And then, when you want to add specific stuff, it generally becomes hell to understand/maintain)
You load the HTML into a DOMDocument class, load that into simpleXML. Then you do an XPath query for all p elements and then loop through them. On each loop, you rename the class attribute to something like "killmeplease".
When that's done, reoutput the simpleXML as XML (which, by the way, may change the HTML, but usually only for the better), and you will have a HTML string where each p has a class of "killmeplease". Use str_replace to actually remove them.
Example:
$html_file = "somehtmlfile.html";
$dom = new DOMDocument();
$dom->loadHTMLFile($html_file);
$xml = simplexml_import_dom($dom);
$paragraphs = $xml->xpath("//p");
foreach($paragraphs as $paragraph) {
$paragraph['class'] = "killmeplease";
}
$new_html = $xml->asXML();
$better_html = str_replace('class="killmeplease"', "", $new_html);
Or, if you want to make the code more simple but tangle with preg_replace, you could go with:
$html_file = "somehtmlfile.html";
$html_string = file_get_contents($html_file);
$bad_p_class = "/(<p ).*(class=.*)(\s.*>)/";
$better_html = preg_replace($bad_p_class, '$1 $3', $html_string);
The tricky part with regular expressions is they tend to be greedy and trying to turn that off can cause problems if your p element tag has a line break in it. But give either of those a shot.
HTML can be very tricky to regex because of the hundreds of different ways code can be written or formatted.
The HTML purifier is a mature open source library for cleaning up HTML. I would advise its usage in this case.
In HTML purifier's configuration documentation, you can specify classes and attributes which should be allowed and what the purifier should do if it finds them.