views:

87

answers:

3

Hi.

I store the content of a website in a string $html.

I want to count all html links that link to a file in the .otf format, add a list of these links to the end of $html and remove the original links.

An example:

<?php
$html_input = '
<p>
    Lorem <a href="font-1.otf">ipsum</a> dolor sit amet, 
    consectetur <a href="http://www.cnn.com"&gt;adipiscing&lt;/a&gt; elit.
    Quisque <a href="font-2.otf">ultricies</a> placerat massa 
    vel dictum.
</p>'

// some magic here    

$html_output = '
<p>
    Lorem ipsum dolor sit amet, 
    consectetur <a href="http://www.cnn.com"&gt;adipiscing&lt;/a&gt; elit.
    Quisque ultricies placerat massa 
    vel dictum.
</p>
<p>.otf-links: 2</p>
<ul>
    <li><a href="font-1.otf">ipsum</a></li>
    <li><a href="font-2.otf">ultricies</a></li>
</ul>'
?>        

How do I do that? Should I use regular expressions, or is there another way?

+2  A: 

Use a DOM Parser

Example:

$h = str_get_html($html);

$linkCount = count($h->find('a'));

foreach ( $h->find('a') as $a ){
    //print every link ending in .odf
    if ( ends_with(strtolower($a->href), '.odf') ){ //ends with isn't a function, but it is trivial to write

        echo '<li><a href="'.$a->href.'">'.$a->innertext.'</a></li>';
    }
}
Yacoby
+1 for recommending a dom parser
marcgg
I love simple html dom! You beat me to it, but you left out the part about replacing removing the anchor tags from the original input.
Justin Johnson
A: 
preg_match('~<a href="[^"]+\.otf">.*?</a>~s', $html_input, $matches);
$linksCount = count($matches[0]);
preg_replace('~<a href="[^"]+\.otf">.*?</a>~s', '', $html_input);
$html_input.='<ul><li>'.implode('</li><li>', $matches[0]).'</li></ul>';
Sam Dark
We all know what will happen if you parse HTML using regexp... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
marcgg
I even posted a warning comment on the OP.
Justin Johnson
+5  A: 
require_once("simple_html_dom.php");

$doc = new simple_html_dom();
$doc->load($input_html);

$fonts = array();
$links = $doc->find("a");

foreach ( $links as $l ) {
    if ( substr($l->href, -4) == ".otf" ) {
        $fonts[]      = $l->outertext;
        $l->outertext = $l->innertext;
    }
}

$output = $doc->save() . "\n<p>.otf-links: " . count($fonts) ."</p>\n" .
    "<ul>\n\t<li>" . implode("</li>\n\t<li>", $fonts) . "</li>\n</ul>";

Documenation for Simple HTML DOM http://simplehtmldom.sourceforge.net/

Justin Johnson
+1 for the example. Less thrown together than mine. Fixed an issue that may cause the script to fail if the length of the href is less than 4.
Yacoby
Thanks for your effort. This does pretty much what I wanted, except it removes the ancor tags in the list as well. Swapping _$l->outertext = $l->innertext;_ and _$fonts[] = $l;_ doesn't help, so how do I fix this?
snorpey
@Yacoby Thanks mate; however, `substr` will happily continue without error even if the string length is 0, so the check isn't necessary. @snorpey I fixed the issue. Remember that objects in PHP are assigned by reference unless you explicitly clone them. The fix is to assign the actual string representation of the anchor object to `$fonts[]` before altering it.
Justin Johnson