views:

441

answers:

1

Hi,

I want to automate the archiving of the data on this page http://energywatch.natgrid.co.uk/EDP-PublicUI/Public/InstantaneousFlowsIntoNTS.aspx, and upload into a database.

I have been using python and win32com (behind a corporate proxy, so no direct net access, hence I am using IE to do so) on other pages to do this. My question is that is there anyway to extract and save the CSV data that is returned when clicking the "Click here to download data" link at the bottom? This link is a javascript postback, and would be much easier than reformatting the page itself into CSV.

. Of course, I'm not necessarily committed to using Python if a simpler alternative can be suggested?

Thanks

+1  A: 

Here's a better way, using the mechanize library.


import mechanize

b = mechanize.Browser()
b.set_proxies({'http': 'yourproxy.corporation.com:3128' })

b.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)')]
b.open("http://energywatch.natgrid.co.uk/EDP-PublicUI/Public/InstantaneousFlowsIntoNTS.aspx")

b.select_form(name="form1")
b.form.find_control(name='__EVENTTARGET').readonly = False
b.form['__EVENTTARGET'] = 'a1'

print b.submit().read()

Note how you can specify that mechanize should use a proxy server (also possible using plain urllib). Also note how ASP.NETs javascript postback is simulated.

Edit:

If your proxy server is using NTLM authentication, that could be the problem. AFAIK urllib2 does not handle NTLM authentication. You could try the NTLM Authorization Proxy Server. From the readme file:


WHAT IS 'NTLM Authorization Proxy Server'?

'NTLM Authorization Proxy Server' is a proxy-like software, that will authorize you at MS proxy server and at web servers (ISS especially) using MS proprietary NTLM authorization method and it can change some values in your client's request header so that those requests will look like ones made by MS IE. It is written in Python language. See www.python.org.


codeape
I tried that, using b.set_proxies({'http': 'user:pass@proxyserver:80' }) as my set_proxies string, but get this error: HTTP Error 407: Proxy Authentication Required ( The ISA Server requires authorization to fulfill the request. Access to the Web Proxy filter is denied.This was the original reason why I switched to using COM + IE as a workaround for this. Any idea how to workaround this?Thanks for your help
Brendan
If your proxy server is using NTLM authentication, that could be the problem. I have updated my answer with a suggestion to use NTLM Authorization Proxy Server - a local proxy that supposedly will translate between NTLM and basic authentication. I downloaded the trunk version and tested it on Python 2.5. A functioning proxy. I do not have an IAS proxy server with NTLM available for test.
codeape