tags:

views:

38

answers:

2

Hey

I am getting a page via curl with this code:

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$message = curl_exec($ch);

curl_close($ch);

I now want to make some replacements to the code in $message, but before that I dump the code in a file:

file_put_contents('debug_before_replace.txt',$message);

When I take a look at this file, all the text seems fine, for example I have here the title:

<title>D.O.C.| Jantar Vínico Quinta do Portal | Quinta-feira, 25 de Junho 2009</title>

Now I do the replace:

$message = str_ireplace( array(
'body>', '/body>' ), array(
$fraseemcima, $frasebaixo ), $message );

And now I dump the $message to another file:

file_put_contents('debug_after_replace.txt',$message);

When I take a look at the file I see this:

<title>D.O.C.| Jantar Vínico Quinta do Portal | Quinta-feira, 25 de Junho 2009</title>

And I have all sorts of messed up chars in the rest of the code.

Anyone understand why stri_replace is messing this up? I am trying to send some mail and this messes up everything.

NOTE: In the replace I do have the body and /body tags correct, but if I added the < SO would remove the words

EDIT: I have fixed it!!

Whit this simple line everything works and looks great in outlook:

$message = utf8_decode(curl_exec($ch));

Thanks to macbirdie and S. Gehrig for pointing me in the right direction. Guess I still have some learning to do when it comes to charsets

Thanks again

+1  A: 

The text you're editing is most likely UTF-8 or some other multi-byte encoding encoded. str_ireplace() is not multi-byte safe and operates on single bytes. This leads to a situation where your multi-byte characters might be destroyed. You should also check, if the document contains a Byte-Order-Mark (BOM) - this could also lead to some problems (according to this comment)

Stefan Gehrig
+1  A: 

You would have to use a multi-byte-aware function for character replacement, like mb_eregi_replace() instead.

If there is any, the file can also lose the byte-order mark that unicode uses to indicate what endiannes the UTF-8 extended characters are written in as it may be lost because the replacement function may treat it as non-text, but that's purely speculation.

macbirdie
Yes this might be the problem. Too bad there is no mb_str_replace.I'll test this and report here.Thanks again
AntonioCS