tags:

views:

168

answers:

4

hi all just wrote this:

<?php

function unicodeConvert($str)
{
    header('Content-Type:text/html; charset=UTF-8');
    $entityRef = array('"' => "&quot;", "&" => "&amp;", '¢' => "&cent;", '¤' => "&curren;", '¦' => "&brvbar;", '¨' => "&uml;", 'ª' => "&ordf;", '¬' => "&not;", '®' => "&reg;", '°' => "&deg;", '²' => "&sup2;", '´' => "&acute;", '¶' => "&para;", '¸' => "&cedil;", 'º' => "&ordm;", '¼' => "&frac14;", '¾' => "&frac34;", 'À' => "&Agrave;", 'Â' => "&Acirc;", 'Ä' => "&Auml;", 'Æ' => "&AElig;", 'È' => "&Egrave;", 'Ê' => "&Ecirc;", 'Ì' => "&Igrave;", 'Î' => "&Icirc;", 'Ð' => "&ETH;", 'Ò' => "&Ograve;", 'Ô' => "&Ocirc;", 'Ö' => "&Ouml;", 'Ø' => "&Oslash;", 'Ú' => "&Uacute;", 'Ü' => "&Uuml;", 'Þ' => "&THORN;", 'à' => "&agrave;", 'â' => "&acirc;", 'ä' => "&auml;", 'æ' => "&aelig;", 'è' => "&egrave;", 'ê' => "&ecirc;", 'ì' => "&igrave;", 'î' => "&icirc;", 'ð' => "&eth;", 'ò' => "&ograve;", 'ô' => "&ocirc;", 'ö' => "&ouml;", 'ø' => "&oslash;", 'ú' => "&uacute;", 'ü' => "&uuml;", 'þ' => "&thorn;", '¡' => "&iexcl;", '£' => "&pound;", '¥' => "&yen;", '§' => "&sect;", '©' => "&copy;", '«' => "&laquo;", '¯' => "&macr;", '±' => "&plusmn;", '³' => "&sup3;", 'µ' => "&micro;", '·' => "&middot;", '¹' => "&sup1;", '»' => "&raquo;", '½' => "&frac12;", '¿' => "&iquest;", 'Á' => "&Aacute;", 'Ã' => "&Atilde;", 'Å' => "&Aring;", 'Ç' => "&Ccedil;", 'É' => "&Eacute;", 'Ë' => "&Euml;", 'Í' => "&Iacute;", 'Ï' => "&Iuml;", 'Ñ' => "&Ntilde;", 'Ó' => "&Oacute;", 'Õ' => "&Otilde;", '×' => "&times;", 'Ù' => "&Ugrave;", 'Û' => "&Ucirc;", 'Ý' => "&Yacute;", 'ß' => "&szlig;", 'á' => "&aacute;", 'ã' => "&atilde;", 'å' => "&aring;", 'ç' => "&ccedil;", 'é' => "&eacute;", 'ë' => "&euml;", 'í' => "&iacute;", 'ï' => "&iuml;", 'ñ' => "&ntilde;", 'ó' => "&oacute;", 'õ' => "&otilde;", '÷' => "&divide;", 'ù' => "&ugrave;", 'û' => "&ucirc;", 'ý' => "&yacute;", 'ÿ' => "&yuml;");

    foreach($entityRef as $key => $obj)
    {
     if($key!="&")
     {
      $str = str_replace($key, $obj, $str);
     }
     else
     {
      $str = preg_replace("#&((?!(amp;)|(igrave;)|(laquo;)|(Ugrave;)))#is", " ".$obj." ", $str); 
     }
    }
    return $str;
}

echo unicodeConvert("i want pies &&&& poo but not « &amp; &igrave; &Ugrave; && && &");

?>

viewSource:

i want pies &amp; &amp; &amp; &amp; poo but not &laquo; &amp; &igrave; &Ugrave; &amp; &amp; &amp; &amp; &amp;

output to browser:

i want pies & & & & poo but not « & ì Ù & & & & &

problem being, it adds a space on the end of some &amp. can anyone see why?

A: 

I'm not sure exactly what you're trying to do, but have you seen the htmlentities() function? It might do what you want already:

htmlentities — Convert all applicable characters to HTML entities

string htmlentities ( string $string [, int $quote_style = ENT_COMPAT [, string $charset [, bool $double_encode = true ]]] )

This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.

If you're wanting to decode instead (the reverse) you can use html_entity_decode().

John Kugelman
A: 

Do you have any reason to not use the default php function html_entity_decode?

$str = "want pies &&&& poo but not « &amp; &igrave; &Ugrave; && && &";
echo html_entity_decode($str, ENT_QUOTES, 'UTF-8');

EDIT

The following code was added latter to reflect your comments. According to those I presume you want to change everything to their html entities except both < and >.

// I've added < and > to the end, those will not be converted
$str = "want pies &&&& poo but not « &amp; &igrave; &Ugrave; && && & both < and >";
$str = htmlentities($str, ENT_QUOTES, 'UTF-8');

$search = array("&lt;", "&gt;");
$replace = array("<", ">");
echo str_replace($search, $replace, $str);

This would do it.

Francisco

Frankie
I have used and tried every encoding function. When there are mixtures of encoding being in putted into scripts im always getting exceptions. This should work.
Phil Jackson
i wanted all chars that are not going to inter fear with html structure (< and > would had to be as they are) to be converted into there html entity reference equivalent.
Phil Jackson
I'm having more difficult to understand exactly what you want than to sort out the programming. When you say "i want all chars that are not going to mess html (except < and > that must stay the same) to be converted to their html entity reference equivalent" you, apparently want to encode, and not decode.But tell you what, I will modify my answer to suit all cases.
Frankie
A: 

sorry, its late and am tired, sorted it now

Phil Jackson
A: 
echo unicodeConvert("i want pies &&&& poo but not « &amp; &igrave; &Ugrave; && && &");
echo "<br />".htmlentities("i want pies &&&& poo but not « &amp; &igrave; &Ugrave; && && &");
echo "<br />".html_entity_decode("i want pies &&&& poo but not « &amp; &igrave; &Ugrave; && && &");

this would output the folowing:

i want pies &&&& poo but not « &amp; ì Ù && && &
i want pies &&&& poo but not « &amp; &igrave; &Ugrave; && && &
i want pies &&&& poo but not « & � � && &&

and the source:

i want pies &amp;&amp;&amp;&amp; poo but not &laquo; &amp;amp; &igrave; &Ugrave; &amp;&amp; &amp;&amp; &amp;<br />i want pies &amp;&amp;&amp;&amp; poo but not &Acirc;&laquo; &amp;amp; &amp;igrave; &amp;Ugrave; &amp;&amp; &amp;&amp; &amp;<br />i want pies &&&& poo but not « & � � && &&

as you can see only my function returns what i am after.

Phil Jackson