views:

433

answers:

8

I want to write some javascript and have it call into the DOM for a page I am loading from a 3rd party domain. Can this be done? This looks like what I've already tried using IFRAME but it would seem that doesn't work. Is these some other way like having FF run some javascript directly rather than as part of a page?

I know this has all kinds of security problems but I'm the guy writing the code and the only guy who will run it.


The backstory: I'm trying to automate some web site iterations.

My fist IFRAME pass didn't work because a web page from file:////.... is not in the same domain as a page in http://whatever.com. Surprise, surprise.

+1  A: 

JavaScript has a same domain policy. You are not going to be able to access the other domain. It is to protect you have hackers/bad people.

epascarello
good point (+1) but not helpful.
BCS
+2  A: 

read up on bookmarklets. Basic idea is you create a bookmark that executes some javscript code that dynamically injects javascript into the page currently loaded in your browser .. most of the web page clipping applications do this

Scott Evernden
A: 

I'm not sure I fully understand the issue, maybe you could describe the situation more ....but I'm guessing you're running into cross-site-scripting security problems if you are accessing across domains.

So..

maybe checkout the document.domain property which can enable script access across window objects in most browsers.

Both sites must be accessed via the same main domain, but can have different sub-domains so long as document.domain is set to the "main" part of the domain on both sites.

ozone
A: 

Not what I was thinking of but: iMacros might do some of what I want.

After looking it seems a bit limited and the docs are a bit to much bling and not enough meat.

BCS
+3  A: 

If I understand the question correctly, you probably won't be able to do it using Javascript alone, because of the domain restriction that you experienced. However, if you have some knowlege on using shell scripts, or any scripting language, it should be no problem, all you need to do is invoke the good old curl.

Example in PHP:

<?php
$url = "http://www.example.com/index.html";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0');
$fp = curl_exec($ch);
curl_close($ch);
?>

And that's pretty much it. You have the actual HTML code in the $fp variable. So, all in all, what I would do is write a little Javascript Ajax function to PHP which does the curl and then returns the $fp variable via echo to the Javascript callback, and then maybe insert it on the document (using innerHTML or the DOM), and bam, you have access to all the stuff. Or you could just parse it in PHP. Either way, should work fine if you do it through curl. Hope that helps.

Edit: After some thought I seem to remember that Safari removes the cross domain restriction for localhost. After researching some more, I'm unable to find any documentation that supports this theory of mine, so I dug a little deeper and found a better (although hackier) way to accomplish this whole mess via Apache if you're using it (which you probably are).

Apache’s mod_proxy will take a request for something like “/foo” and actually tunnel the request to some remote destination like “http://dev.domain.com/bar”. The end result is that your web browser thinks you’ve made a call to http://localhost/foo but in reality you’re sending and retrieving data from a remote server. Security implications solved!

Example:

LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule rewrite_module modules/mod_rewrite.so

Let’s assume that I want to access a file at http://dev.domain.com/remote/api.php. You would put all of the following into a :

# start mod_rewrite
RewriteEngine On
ProxyRequests Off
<Proxy>
   Order deny,allow
   Allow from all
</Proxy>

ProxyPass /apitest/ http://dev.domain.com/remote/api/
ProxyPassReverse /apitest/ http://dev.domain.com/remote/api/
RewriteRule ^/apitest/(.*)$ /remote/api/$1 [R]

Source

More edit:

Seeing as how you want to avoid the whole server setup thing, I gave it a shot using an IFRAME on Safari (Mac), and it worked, at least for the domains I tried:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"&gt;
<html>
<head>
</head>
    <body>
        <iframe src="http://www.stackoverflow.com/"&gt;&lt;/iframe&gt;
    </body>
</html>
Dave
That might work. however If the web page does any kind of fancy footwork (like having javascript much with it's own DOM) that won't get me what I need.
BCS
Oh, yeah, but unless you can actually add code to the third party site, which I doubt, there's probably no way to get actually get their DOM manipulations.However, if by any chance you're working on a Mac or Windows machine, I think Safari doesn't have the domain restriction for localhost.
Dave
So a page loaded from localhost can have javascript that mucks with a frame/iframe that isn't? good to know.
BCS
Haha, ok, will do.
Dave
I was kinda hoping fro a solution that didn't require any server on my part.
BCS
+1 for including your 'experimental workflow' as it were--very informative, more so than if you had just mentioned the results.
doug
+1  A: 

Take a look at Selenium Remote-Control. The server acts as a proxy for your browser to bypass the same-domain policy:

Finally, the Selenium Server acts as a client-configured HTTP proxy, to stand in between the browser and your website. This allows a Selenium-enabled browser to run JavaScript on arbitrary websites.

You might consider applying the same approach and writing your own proxy or even a simple web app that echoes pages from other domains (see Dave's answer).

Or, simply use Selenium for your automation.

Ates Goral
looks good enough to download.
BCS
+1  A: 

There is a way to relax Firefox's domain security.

1 Add this line to Firefox's user.js.

user_pref("signed.applets.codebase_principal_support", true);

2 Add this line to every javascript function that needs to cross a domain.

netscape.security.PrivilegeManager.enablePrivilege( "UniversalBrowserRead UniversalBrowserWrite" );

3 The first time Firefox attempts to cross the domain, it will warn you of the attempt and prompt for your permission.

Good news, the bug that prevented this from working with Firefox 3 appears to be fixed.

Bill
crud that's a lot of work! (that's a good thing most of the time)
BCS
Bypassing security shouldn't be trivial =) The prompt appears only once. Let's say you want your web page to access other sites 1, 2, and 3. You load your page and accesses other1, and you choose to allow cross domain access. Now, when your server accesses 1, 2, or 3, there's no prompt.
Bill