views:

83

answers:

4

How can i replace this <p><span class="headline"> with this <p class="headline"><span> easiest with PHP.

$data = file_get_contents("http://www.ihr-apotheker.de/cs1.html");
$clean1 = strstr($data, '<p>');
$str = preg_replace('#(<a.*>).*?(</a>)#', '$1$2', $clean1);
$ausgabe = strip_tags($str, '<p>');
echo $ausgabe;

Before I alter the html from the site I want to get the class declaration from the span to the <p> tag.

Thx for help!

+3  A: 

dont parse html with regex! this class should provide what you need http://simplehtmldom.sourceforge.net/

Christian Smorra
Suggested third party alternatives that actually use DOM instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html) and [FluentDom](http://www.fluentdom.org).
Gordon
that looks very nice i will investigate it further
Arwed
A: 

Have you tried using str_replace? If the placement of the p and span tags are consistent, you can simply replace one for the other with str_replace("replacement","part to replace", $string)

Dylan West
thx for your very fast help like all the others! this worked fine for me
Arwed
you're welcome. I'm glad it was an easy fix
Dylan West
+1  A: 

The reason not to parse HTML with regex is if you can't guarantee the format. If you already know the format of the string, you don't have to worry about having a complete parser.

In your case, if you know that's the format, you can use str_replace

str_replace('<p><span class="headline">', '<p class="headline"><span>', $data);

Zurahn
the html in my case always has the same format. but thx for your answer!
Arwed
A: 

Well, answer was accepted already, but anyway, here is how to do it with native DOM:

$dom = new DOMDocument;
$dom->loadHTMLFile("http://www.ihr-apotheker.de/cs1.html");

// remove links but keep link text
$links = $dom->getElementsByTagName('a');
foreach($links as $link) {
    $link->parentNode->replaceChild(
        $dom->createTextNode($link->nodeValue), $link);
}

// switch classes
$xPath = new DOMXpath($dom);
foreach($xPath->query('//p/span[@class="headline"]') as $node) {
    $node->removeAttribute('class');
    $node->parentNode->setAttribute('class', 'headline');
}
echo $dom->saveHTML();

On a sidenote, HTML has elements for headings, so why not use a <h*> element instead of using the semantically superfluous "headline" class.

Gordon
because i like to edit css files more then to code in the html - additionally i want to look them different then the rest of my headlines
Arwed
if i want to erase the text of the link also i just leave the line with $dom->createTextNode($link->nodeValue), $link);i guess?
Arwed
@Arwed You have to distinguish between semantic structure and presentation. The `<h*>` elements structure your code, but they dont say how the headings will look. The fact that browsers usually render them big and bold is a browser thing. A text-browser wouldnt do this. Also, you can still style the same `<h*>` by applying different classes.
Gordon
@Arwed If you want to erase the text of the link as well, you are effectively erasing the entire link. In this case use `$link->parentNode->removeChild($link)`. When working with XML you have to understand the node concept. <a href="#">foo</a> is three nodes. An element node for `a`, an attribute for `a`'s `href` and a child text node for `foo`.
Gordon