views:

30

answers:

1

Note: The input HTML is trusted; it is not user defined!

I'll highlight what I need with an example.

Given the following HTML:

<p>
  Welcome to <a href="http://google.com/" class="crap">Google.com</a>!<br>
  Please, <a href="enjoy.html">enjoy</a> your stay!
</p>

I'd like to to convert it to:

Welcome to Google.com[1]
Please, enjoy[2] your stay!

[1] http://google.com/
[2] %request-uri%/enjoy.html    <- note, request uri is something I define
                                   for relative paths

I'd like to be able to customize it.


Edit: On a further note, I'd better explain myself and my reasons

We have an automated templating system (with sylesheets!) for emails and as part of the system, I'd like to generate multipart emails, ie, which contain both HTML and TEXT. The system is made to only provides HTML.

I need to convert this HTML to text meaningfully, eg, I'd like to somehow retain any links and images, perhaps in the format I specified above.

A: 

You could use the DOM to do the following:

$doc = new DOMDocument();
$doc->loadHTML('…');

$anchors = array();
foreach ($doc->getElementsByTagName('a') as $anchor) {
    if ($anchor->hasAttribute('href')) {
        $href = $anchor->getAttribute('href');
        if (!isset($anchors[$href])) {
            $anchors[$href] = count($anchors) + 1;
        }
        $index = $anchors[$href];
        $anchor->parentNode->replaceChild($doc->createElement('a', $anchor->nodeValue." [$index]"), $anchor);
    }
}
$html = strip_tags($doc->saveHTML());
$html = preg_replace('/^[\t ]+|[\t ]+$/m', '', $html);
foreach ($anchors as $href => $index) {
    $html .= "\n[$index] $href";
}
Gumbo