tags:

views:

1166

answers:

1

When I look at the XML data feed i get with the below code, special characters are correct in the XML code. However when Curl returns the data, characters like "ó" and "ä" are converted into resp. "ó" and "ä". This conversion happens to all special characters, these 2 are just an example.

$myvar = curl_init();
$myURL = "http://someurl.com/";
curl_setopt($myvar, CURLOPT_USERAGENT, '[Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2")]');
curl_setopt($myvar, CURLOPT_URL, $myURL);
curl_setopt($myvar, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($myvar, CURLOPT_TRANSFERTEXT, TRUE);
curl_setopt($myvar, CURLOPT_CONNECTTIMEOUT,3);
$xmlstr = curl_exec ($myvar);

The header of the XML file says to encode as follows "?xml version="1.0" encoding="UTF-8"?"

All I want is to get the same characters to show up in the Curl result without any transformation.

Hoping i just missed some plain easy step, looking forward to any answers.

Best regards Fons

+4  A: 

How do you know $xmlstr contains the wrong bytes? If you're looking at the output in a terminal window of some sort, it's probable that the problem is that the terminal does not support UTF-8, not that cURL is broken.

cURL doesn't care about UTF-8 or any other character encoding - its job is just to fetch a sequence of bytes from somewhere. It's not likely to be doing anything that will mangle special characters. If there's something wrong with the way you're using cURL, it'll be mangling everything, not just non-ASCII characters.

Dominic Rodger
@Ionut - thanks, don't know what came over me!
Dominic Rodger
When i use echo $xmlstr and look in the sourcefile the characters are coverted.I use <pre><code><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"></code></pre> as header for the output (html)your suggestion is change that? than i would be very happy to know what header to use instead.regardsFons
Fons
Try adding `<meta http-equiv="content-type" content="text/html; charset=utf-8"/>` at the top of your `<head>...</head>` section.
Dominic Rodger
you just saved my day!!works like expected and now i dont have the funny "ó" and "ä" anymore.Just can slam myself in the head for this.As this a local website i did not care about any meta tags in the header, where i use a default set for external pages.Many thanks, you just gave me a workless weekend !Best regardsFons
Fons
@Fons - no problem, glad it worked for you!
Dominic Rodger