views:

298

answers:

2

I have some code (incidentally, it is for Omniture SiteCatalyst) that renders a 1x1 pixel based on some JavaScript object variables I set in the page's source code. The JavaScript eventually creates an img based on the scripting code, but the img src isn't hard-coded into the HTML. How can I figure out what the img src is, given the URL of a page? If I just grab the page, I'll get the pre-rendered JavaScript.

EDIT

For example, let's say I have this code for StackOverflow.html:

<html>
<script type="text/javascript">
a = 2
document.write(a)
</script>
</html>

How can I fetch StackOverflow.html and somehow get the value "2" instead of all of my scripting code?

Thanks!

A: 

Edit:

to answer your restated question:

it seems to me that your problem is figuring out what the page will look like after the JS is run on it.

There is no simple way of doing this that will give you 100% accurate results, for that you will need to actually RUN the javascript and see what the results are, which is really not-easy when you arent in a browser.

Now you have several options. You didnt mention what tool you are using for grabbing the page, ill assume you are using a custom built scraper. If you want to keep using the scarper you can:

  • look into using rhino to evaluate the JS. I am not sure what this will give you, you can research this.
  • if document.write is the only call you care about, you can parse out the variables it uses, and then try to evaluate their values. this will require writing a parser, probably difficult.
  • best thing you can do is use an functional testing tool like tellurium or selenium. This will give you access to the page where the JS has already run, and you can use my original answer to get the value you need.
mkoryak
This doesn't answer my question at all.
Zachary Burt
when you originally asked your question, you were vague, and thus you got this answer. no need to downvote a bad answer to a bad question :P
mkoryak
+1 mkoryak's answer was and is valid for such a vague question.@Zachary: Have you read mkoryak's edited answer, is it helpful. Are you trying to parse the JavaScript on the server-side?
brianpeiris
P.S. Sorry mkoryak, I accidentally removed my upvote and I can't upvote again until the question is edited.
brianpeiris
Sorry, I upvoted you. Thanks for the help.
Zachary Burt
+1  A: 

If you're trying to get the value of a after the script has run on the client-side (i.e. in the browser), you should just be able to retrieve it in the normal way.

Take the following setup:

index.html

This file is your webpage. It contains some content, a tracking script that inserts an image and your own script.

<!doctype html>
<html>
<head><title>My Page</title></head>
<body>
  <p>My Content<p>
  <!-- Start tracking code -->
  <script src="tracking.js"></script>
  <!-- End tracking code -->
  <script src="mycode.js"></script>
</body>
</html>

tracking.js

This is the tracking code, presumably provided by the tracking company.

var id = '1234foobar';
var visitorUserAgent = encodeURIComponent(navigator.userAgent);
document.write(
  '<img src="http://tracking.com/1x1.gif?id='
  + id + '&ua=' + visitorUserAgent + '" />'
);

mycode.js

If you know what variables (if any) the tracking code creates, you should be able to retrieve the variables themselves or at least the src attribute of the img tag that the tracking code creates.

var imgs = document.getElementsByTagName('img');
alert([id, visitorUserAgent, imgs[imgs.length - 1].src].join('\n'));
brianpeiris