views:

25

answers:

1

I wanted to use published GoogleDocs documents and twitter tweets as the datasource of a Silverlight application but ran into clientaccesspolicy issues.

I read many articles like this and this about how difficult it is to get around the clientaccesspolicy issue.

So I wrote this CURL script and put it on my PHP site and now I can get the text of any GoogleDocs document and twitter feed into my Silverlight application:

<?php
$url = filter_input(INPUT_GET, 'url',FILTER_SANITIZE_STRING);

$validUrls[] = "http://docs.google.com";
$validUrls[] = "http://twitter.com/statuses/user_timeline"; 

if(beginsWithOneOfThese($url, $validUrls)) {
  $user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/cookie");
  curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookie");
  curl_setopt($ch, CURLOPT_URL, $url ); 
  curl_setopt($ch, CURLOPT_FAILONERROR, 1); 
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); 
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); 
  curl_setopt($ch, CURLOPT_TIMEOUT, 15);
  curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
  curl_setopt($ch, CURLOPT_VERBOSE, 0);
  echo curl_exec($ch);
} else
  echo "invalid url";

function beginsWithOneOfThese($main, $prefixes) {
  foreach($prefixes as $prefix) {
    if(beginsWith($main, $prefix))
      return true;
  }
  return false;
}

function beginsWith($main, $prefix) {
    return strpos($main, $prefix) === 0;
}

?>

So it makes me wonder:

  • Why is there so much discussion about whether or not URLs support clientaccesspolicy or not, since you just have to write a simple proxy script and get the information through it?
  • Why aren't there services, e.g. like the URL shortening services, which supply this functionality?
  • What are the security implications of having a script like this?
+2  A: 

While you might think that a proxy gives you the same capabilities as having the client make the request, it doesn't. More specifically, you won't have the client's cookies/credentials for the target site, and in some cases, a client can reach the target site but your proxy can't (e.g. Intranet).

http://blogs.msdn.com/ieinternals/archive/2009/08/28/Explaining-Same-Origin-Policy-Part-1-Deny-Read.aspx explains Same Origin Policy at some length.

In terms of the security implications for your proxy-- well, that depends on whether you have access control on that. If not, a bad guy could use your proxy to hide his tracks as he hacks sites or downloads illegal content.

EricLaw -MSFT-
yes, that makes sense, e.g. I did run into an issue that it couldn't follow redirected sites, but it seems I have access to any open text document on the open Internet, e.g. all RSS feeds, published Google Docs, any public website.
Edward Tanguay
are there any public services which provide this functionality? I would think it would be valuable to collect the URLs that are being accessed in a similar way they are collected by URL shortening services, and then all the security could be concentrated in one point
Edward Tanguay
ok, I added URL access control to it as you suggested, good point.
Edward Tanguay