tags:

views:

97

answers:

2

So I'm working on a project that is taking data from a file, in the file some lines require utf8 symbols but are encoded oddly, they are \xC6 for example rather than being \Æ

If I do as follows:

$name = "\xC6ther";
$name = preg_replace('/x([a-fA-F0-9]{2})/', '&#$1;', $name);
echo utf8_encode($name);

It works fine. I get this:

Æther

But if I pull the same data from MySQL, and do as follows:

$name = $row['OracleName'];
$name = preg_replace('/x([a-fA-F0-9]{2})/', '\&#$1;', $name);
$name = utf8_encode($name);

Then I receive this as output:

\&#C6;ther

Anyone know why this is?

As requested, vardump of $row['OracleName'];

string(15) "xC6ther Barrier"
+1  A: 

on your second preg_replace why there is a \

preg_replace('/x([a-fA-F0-9]{2})/', '&#$1;', $name);

ok I think there is some confusion here. you regular expression is matching something like x66 and would replace that by '&#66', which seems to be some html entities encoding to me but you are using utf8_encode which do that (from manual):

utf8_encode — Encodes an ISO-8859-1 string to UTF-8

so the things would never get converted ... (or to be more precise the '&#66' would remains '&#66' since they are all same characters in ISO-8859-1 and UTF-8)

also to be noted on your first snippet you use \xC6 but this would never get caught by the preg_replace since it's already encoded character. The \x means the next hex number (0x00 ~ 0xFF) would be drop in the string as is. it won't make a string xC6

So I am kind of confused of what you really wanna do. what the preg_replace is all about?

if you want to convert HTML entities to UTF-8 look into mb_convert_encoding (manual), if you want to do the reverse, code in HTML entities from some UTF-8 look into htmlentities (manual)

and if it has nothing to do with all of that and you want to simply change encoding mb_convert_encoding is still there.

RageZ
When the data is pulled from MySQL it lacks the leading \, though the data is in the mysql table itself.
Trick Jarrett
sorry I missed your point, can you show us how look `$row['OracleName']` with `var_dump`
RageZ
A: 

Figured out the problem, on the SQL pull I missed an 'x' in the preg_replace

preg_replace('/x([a-fA-F0-9]{2})/', '&#x$1;', $name);

Once I added in the x, it worked like a charm.

Trick Jarrett
@Trick: I really don't know what you doing **but** if your problem is fix that's nice
RageZ
@Trick: right the number is in hex so should be `NN`, but I am still concerned about your reg exp matching for example `xaa` as being some encoding where it's not
RageZ