views:

108

answers:

4

Hello I just start to develop php what I want to do is to get xml contents from another site but when i get it like this

$options = array(
  CURLOPT_RETURNTRANSFER => true,     // return web page
  CURLOPT_HEADER         => false,    // don't return headers
  CURLOPT_ENCODING       => "UTF-8",       // handle compressed
 CURLOPT_USERAGENT      => "spider", // who am i
 );
 $ch      = curl_init("http://wxxx.xml");
 curl_setopt_array( $ch, $options );
 $file = curl_exec( $ch );
 curl_close( $ch );

it returns corrupted characters I can make it look like ok when I change header of page to UTF-8 but the problem is that I cannot insert these variables to database they are corrupted there too, How can I fix this? thank you for any answer.

A: 

The CURLOPT_ENCODING option is for specifying the Accept-Encoding header field value and not for the accepted character encoding. Try Accept-Charset instead:

$options = array(
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HEADER         => false,
    CURLOPT_USERAGENT      => "spider",
);
$header = array('Accept-Charset: UTF-8');
$ch     = curl_init("http://wxxx.xml");
curl_setopt_array($ch, $options);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header)
$file = curl_exec($ch);
curl_close($ch);
Gumbo
A: 

From PHP's curl documentation:

CURLOPT_ENCODING: The contents of the "Accept-Encoding: " header. This enables decoding of the response. Supported encodings are "identity", "deflate", and "gzip". If an empty string, "", is set, a header containing all supported encoding types is sent.

This option is not to control how curl will interpret response bytes, but to make it accept a content that is transferred as a compressed stream, e.g. gzip.

Your script will get the content, you can convert its encoding using PHP's mb_string/icnov functions. However, be sure you have set your database collation and connection collation properly.

aularon
A: 

If the characters are fine when you change the header of the page to indicate that it's encoded in UTF-8, they're not corrupted; you're treating character data that's encoded in one format (UTF-8) as though it were encoded in another.

What you should check:

  • Verify that the XML source document is, in fact, UTF-8 encoded, since that's what you're specifying in your curl options.

  • Find out what the encoding used by your database is.

If you need to be able to store Unicode characters in your database, you can change the character encoding there to UTF-8. Alternatively, you can convert from your source document using utf8_decode() (if the database is storing ISO-8859-1 characters) or mb_convert_encoding(). However, if characters in the source document can't be encoded in the system being used by the database, you'll lose information.

ngroot
A: 

Hello again Thanks for answers, they relly helped me to find problem, and special thanks to ngroot because I was stuck at thinking about curl but the problem was at database, when I tried first 2 answers I got no result and when I check the database I saw that I saved Titles of xml file as VARCHAR and UTF-8 Turkish as encoding, than I tried UTF-8 unicode and I saw that database variables became more readable finally I saved titles as VARBINARY to database and everything is resolved again thank you for all help.

Görkem Karahan