I'm trying to extract data from the following page:
Which, conveniently and inefficiently enough, includes all the data embedded as a csv file in the header, set as a variable called gs_csv.
How do I extract this? Document.body.innerhtml
skips the header where the data is, what is the alternative that includes the header (or better yet, the value associated with gs_csv
)?
(Sorry, new to all this, I've been searching through loads of documentation, and trying a lot of them, but nothing so far has worked).
Thanks to Sinan (this is mostly his solution transcribed into Python).
import win32com.client
import time
import os
import os.path
ie = Dispatch("InternetExplorer.Application")
ie.Visible=False
ie.Navigate("http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=&param2=&param3=&param4=&param5=2009-04-22&param6=37#")
time.sleep(20)
webpage=ie.document.body.innerHTML
s1=ie.document.scripts(1).text
s1=s1[s1.find("gs_csv")+8:-11]
scriptfilepath="c:\FO Share\bmreports\script.txt"
scriptfile = open(scriptfilepath, 'wb')
scriptfile.write(s1.replace('\n','\n'))
scriptfile.close()
ie.quit