views:

134

answers:

2

I have a problem where I have some html like this

<p>There is the unfinished business of Taiwan, eventual “reunification”...a communiqué committing</p>

In that text string I would not want to change the < and > to & lt ; and ^ gt ;

However I would want to convert the quotes around “reunification” and the é in communiqué.

+1  A: 

You will likely have to write your own htmlentities() replacement function. The easiest way would probably be to apply htmlentities(), and then replace < (or the numeric one, can't remember which php gives) with a <, and whatever other characters you want.

You might also be interested in Markdown, it is similar to what you are trying to accomplish, and might fit your needs.

http://daringfireball.net/projects/markdown/
http://michelf.com/projects/php-markdown/

Jeffrey Aylesworth
I basically did a variation on this where I removed < and > and a few other chars from get_html_translation_table() and using strtr() over that modified array.
Stewart Robinson
A: 

'<' is a reserved character in XML. Section 2.3 of the XML standard strictly dictates that it MUST be escaped as either an entity or a character reference when used within character data. It is only allowed to appear in its unescapsed form when used as XML markup, or within a comment, processing instruction, or a CDATA section. Why do you want to bypass that requirement?

Remy Lebeau - TeamB
I don't want to escape that character as I know my data doesn't contain any < or > apart from those that actually form the edge markers of the XML
Stewart Robinson
If you want to put HTML markup inside of XML, you MUST either escape reserved characters, such as '<', or else put the HTML into a CDATA block, ie: <MyXmlDoc><HtmlContent><![CDATA[<p>There is the unfinished business of Taiwan, eventual “reunification”...a communiqué committing</p>]]></HtmlContent></MyXmlDoc>
Remy Lebeau - TeamB