I am using YQL for some screen scraping, and any quote-like characters are not being returned properly.
For example, markup on the page being scraped is:
There should not be a “split between what we think and what we do,”
This is returned by YQL as:
There should not be a �split between what we think and what we do,�
This also happens with ticks and back-ticks.
My JS is like:
var qurlString = '&url=' + encodeURIComponent(url);
$.ajax({
type: "POST",
url: "/k_sys/qurl.php",
datatype: "xml",
data: qurlString,
success: function(data) {
//do something
}
});
And my qurl.php is like:
$BASE_URL = "http://query.yahooapis.com/v1/public/yql";
$url = my scraped site url;
$yql_query = "select * from html where url='$url'";
$yql_query_url = $BASE_URL . "?q=" . urlencode($yql_query) . "&format=xml";
$session = curl_init($yql_query_url);
curl_setopt($session, CURLOPT_RETURNTRANSFER,true);
$xml = curl_exec($session);
echo $xml;
Is this a cURL issue or a YQL issue, and what to I need to do to fix it?
Thanks!