How do I gurantee that utf-8 characters are scraped accurately using CURL in php? | ansaurus

tags:

views:

812

answers:

1

+1 Q:

How do I gurantee that utf-8 characters are scraped accurately using CURL in php?

Hello,

I am scraping webpages (using php's curl) that have accented characters (like "é"). In the source of those webpages, those characters are written using utf-8 (they are not html encoded.)

However, when the result is produced using the following code, I get question marks instead of the accented characters.

$ch = curl_init();
$timeout = 5;
curl_setopt ($ch, CURLOPT_URL, $website);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file = curl_exec($ch);
curl_close($ch);

The header info returned from the scraped webpage indicates that the Content is set to "html/text." There's no indication that it's utf-8 encoded. I've tried using CURLOPT_HTTPHEADER curl option to change the text encoding, but that doesn't do anything.

What am I missing?

+1 A:

As per the answer to my question, have a look at http://stackoverflow.com/questions/1277552/characters-changed-in-a-curl-request The answer Dominic Rodger just saved my day with his reply..

Regards Fons

Fons 2009-08-14 13:02:44

related questions

Can't access website via cURL from localhost, but can from hosted server.

Curl post data and headers only

PHP app using Twitter API works on some accounts, not others

Is it possible to compile libCurl with SSH support using vc8?

How do I make php wait for curl to finish before continuing?

Why does session_start cause a timeout when one script calls another script using curl

What $_POST[] do i need to post to a forum?

PHP: how to save cookies for remote web pages ?

PHP4: Send XML over HTTPS/POST via cURL?

How to manage a simple PHP session using C++ cURL (libcurl)

Building libcurl with SSL support on Windows

How do I install cURL on Windows?

php cURL iis 6.0 windows server 2003

How to install PHP/CURL?

"CURLE_OUT_OF_MEMORY" error when posting via https

cURL equivalent in JAVA

PHP :: Emulate <form method="post">, forwarding user to page

cURL in PHP returns different data in _FILE and _RETURNTRANSFER

is there a curl/wget option that says not to save files upon http errors?

Curl command line for consuming webServices?

What is cURL good for ?

Passing $_POST values with cURL

cURL adding whitespace to post content?

PHP / cURL on Windows install: "The specified module could not be found."

How to curl or wget a web page?