tags:

views:

31

answers:

0

I have a PHP file using CURL that accepts a Google Doc URL as a parameter, then returns the plain text of the Google Doc.

It worked well until recently when apparently a redirect was added so that the http:// address redirects to the equivalent https:// address, as in this example:

http://docs.google.com/View?id=dc7gj86r_20dn2csqg3

So I changed my code to access the https:// address, but it just returns blank.

What do I have to change my CURL code so that I can get the HTML text from the https:// address?

$url = filter_input(INPUT_GET, 'url',FILTER_SANITIZE_STRING);

$validUrlPrefixes[] = "https://docs.google.com";

if(beginsWithOneOfThese($url, $validUrlPrefixes)) {
  $user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)';
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/cookie");
  curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookie");
  curl_setopt($ch, CURLOPT_URL, $url ); 
  curl_setopt($ch, CURLOPT_FAILONERROR, 1); 
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); 
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); 
  curl_setopt($ch, CURLOPT_TIMEOUT, 15);
  curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
  curl_setopt($ch, CURLOPT_VERBOSE, 0);

  $rawData = curl_exec($ch);  

  $rawData = cleanText($rawData);

  if(beginsWith($url, "https://docs.google.com")) {
    echo qstr::convertGoogleDocContentToText($rawData);
    die;
  }

  echo $rawData;
  die;