views:

101

answers:

4

If you copy and paste the following text in a html page,

انوان

you will the following Arabic text:

انوان

My question is:

What is the name of this type of encoding that include numbers and hash (#) sign, and how decode it in PHP?

+7  A: 

These are... HTML entities (or "Numeric character references" for the nitpickers).

Try html_entity_decode.

Example:

$foo = html_entity_decode('انوان');
// gives you the arabic words in $foo

(If the string is in the form ا... you need to apply html_entity_decode twice. (I don't know if codaddict's edit is valid.))

KennyTM
wat the downvote for?
Shawn Mclean
Gumbo
+3  A: 

These characters are known as HTML entities. Basically, they're a safer way of representing characters such as & and other symbols that might have meanings in HTML. All characters have a corresponding HTML entity.

You can decode them in PHP by using html_entity_decode

waiwai933
A: 

You can use the convert_uudecode() function for decode.

<?php
echo convert_uudecode("+22!L;W9E(%!(4\"$`\n`"); //It prints I love PHP!
echo "\n";
echo convert_uudecode('&#1575;&#1606;&#1608;&#1575;&#1606;'); //It prints WU±
?>
rekha_sri
A: 

To use proper terminology:

  • &amp; is an entity reference that references the entity named amp.
  • &#1575; is a character reference that references the character U+0627 (1575 in decimal) in the Unicode character set.

Both references are character references as they only reference single characters. But entities can also represent more than just a single character.

Gumbo