views:

60

answers:

2

Hi all,

I am using RCurl in R to try and download data from a website, but I'm having trouble finding out what URL to use. Here is the site:

http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX

See how in the upper right, above the displayed sheet, there's a link to download the data as a .csv file? I was wondering if there was a way to find a regular http address for that .csv file, because RCurl can't handle the Javascript commands.

Thanks in advance,

Andrew

+1  A: 

Clicking on the Download link executes this piece of JavaScript:

__doPostBack('ctl00$MainPageLeft$MainPageContent$ExportHoldings1$LinkButton1','')

That __doPostBack function appears to simply fill in a couple of hidden form fields on that page then submit a POST request.

A quick googling shows that RCurl is capable of submitting a POST request. So, what you would need to do is look in the source of that page, find the form with name "aspnetForm", take all the fields from that form, and create your own POST request that submits the fields to the action URL (http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX).

Can't guarantee this will work, though. There appears to be a hidden form field named __VIEWSTATE that appears to encode some information, and I don't know how this factors in.

Jeff
Great-- where did you find the documentation on how to submit a POST request for Javascript using RCurl?
Andrew Page
http://www.omegahat.org/RCurl/installed/RCurl/html/postForm.html
Jeff
A: 

This is definitely the way to get the .csv file in RCurl, but I can't figure out what form fields I want to use in getForm to make it work. Should I use the fields from the doPostBack command that's attached to the "Download" link on the page, or should I use the fields from aspnetForm on the source page. Just for reference, the aspnetForm field we're interested in is:

" form name="aspnetForm" method="post" action="holdings.aspx?ticker=PGX" id="aspnetForm" style="margin:0px" "

... and the postForm request I just tried that didn't work was

postForm("http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX", "form name" = "aspnetForm", "method" = "post", "action" = "holdings.aspx?ticker=PGX", "id" = "aspnetForm", "style" = "margin:0px")

Thanks for all of the help!

Andrew Page
You would start with the fields in the aspnetForm form, then override whatever is in there with the values that the doPostBack function inserts into the hidden fields. Because what doPostBack does is basically take the existing form and fill in two of the hidden fields, then submit the form.
Jeff