ansaurus

Question

Howto encode texts outside the <pre></pre> tag with htmlentities()? (PHP)

Answer 1

A:

Personally I would accomplish this with a simple state machine:

$text = <<<END
<b>Hello, world!</b>
<pre>Hello there<br/></pre>
END;

$segments = preg_split('/(<\/?pre>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

// $state = 0 if outside of a pre
// $state = 1 if inside of a pre
$state = 0;
foreach ($segments as &$segment) {
    if ($state == 0) {
        if ($segment == '<pre>')
            $state = 1;
        else
            $segment = htmlentities($segment);
    } else if ($state == 1) {
        if ($segment == '</pre>')
            $state = 0;
    }
}

$entityText = implode($segments);

print $entityText;

Output:

&lt;b&gt;Hello, world!&lt;/b&gt;
<pre>Hello there<br/></pre>

Note that the above code does not handle nested pre tags. If you wish to do this, you'll need the following

$segments = preg_split('/(<\/?pre>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

// $depth = how many nested pres we're inside of.
$depth = 0;
foreach ($segments as &$segment) {
    if ($depth == 0 && $segment != '<pre>')
        $segment = htmlentities($segment);
    else if ($segment == '<pre>')
        $depth++;
    else if ($depth > 0 && $segment == '</pre>')
        $depth--;
}

$entityText = implode($segments);

Sebastian P. 2009-08-14 15:34:37

Answer 2

+1 A:

If it's to practice, ok. But if it's just to get the feature, then don't reinvent the wheel. Parsing is not an easy task, and there are plenty of mature parsers out there. Of course, I would look at the PEAR packages first. Try HTML_BBCodeParser.

If you really want to do it yourself, you got two ways :

regexp
state machines

Usually a mix of both is handy. But because tags can be nested and badly formed, it's really a hard stuff to code. At least, use a generic parser code and define you lexical fields, from scratch it will take all the time you use to code the web site.

Btw : using a BBparser does not free you from sanitizing the user input...

EDIT : I'm in a good mood today, so here is a snippet on how to use HTML_BBCodeParser :

// if you don't know how to use pear, you'd better learn that quick
// set the path so pear is in it
ini_set("include_path", ini_get("include_path").":/usr/share/pear");
// include PEAR and the parser
require_once("PEAR.php");
require_once("HTML/BBCodeParser.php");

// you can tweak settings from a ini fil
$config = parse_ini_file("BBCodeParser.ini", true);
$options = &PEAR::getStaticProperty("HTML_BBCodeParser", "_options");
$options = $config["HTML_BBCodeParser"];

// here start the parsing
$parser = new HTML_BBCodeParser();
$parser->setText($the_mighty_BBCode);
$parser->parse();
$parsed = $parser->getParsed();

// don't forget to clean that
echo htmlspecialchars(striptags($parsed));

e-satis 2009-08-14 15:44:59

Answer 3

A:

You could convert the <pre> … </pre> back to <pre> … </pre>:

// convert anything
$str = htmlspecialchars($str);
// convert <pre> back
$str = preg_replace('/&lt;pre&gt;((?:[^&]+|&(?!lt;\\/pre&gt;))*)&lt;\\/pre&gt;/s', '<pre>$1</pre>', $str);

Gumbo 2009-08-14 16:08:56

ansaurus

tags:

views:

answers:

Howto encode texts outside the <pre></pre> tag with htmlentities()? (PHP)

related questions