ansaurus

Question

Alternative to innerhtml that includes header?

Answer 1

+1 A:

Untested: Did you try looking at what Document.scripts contains?

UPDATE:

For some reason, I am having immense difficulty getting this to work using the Windows Scripting Host (but then, I don't use it very often, apologies). Anyway, here is the Perl source that works:

use strict;
use warnings;

use Win32::OLE;
$Win32::OLE::Warn = 3;

my $ie = get_ie();

$ie->{Visible} = 1;

$ie->Navigate(
    'http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?'
    .'param1=&param2=&param3=&param4=&param5=2009-04-22&param6=37#'
);

sleep 1 until is_ready( $ie );

my $scripts = $ie->Document->{scripts};

for my $script (in $scripts ) {
    print $script->text;
}

sub is_ready { $_[0]->{ReadyState} == 4 }

sub get_ie {
    Win32::OLE->new('InternetExplorer.Application', 
        sub { $_[0] and $_[0]->Quit },
    );
}

__END__

C:\Temp> ie > output

output now contains everything within the script tags.

Sinan Ünür 2009-05-25 13:10:39

Hi Sinan,As I said, I'm completely new to all this. Trying ie.document.scripts returns <COMObject <unknown>>. What should the syntax be?Thanks

Brendan 2009-05-25 13:27:59

It is a collection: ie.document.scripts.item[0] should hold the first script in the document. My IE8 is giving me problems, so I can't test.

Sinan Ünür 2009-05-25 14:01:38

ie.document.scripts.item[0] gives an error: TypeError: 'instancemethod' object is unsubscriptable

Brendan 2009-05-25 14:09:52

Hi Sinan,Thanks very much for your help. That works perfectly. Sorry I can't vote you up, it seems I'm not reputable enough to do so....:)Anyway, for future reference,the code in Python is appended.

Brendan 2009-05-25 15:45:50

Answer 2

A:

fetch the source of that page using ajax, and parse the response text like XML using jquery. It should be simple enought to get the text of the first tag you encounter inside the

I'm out of touch with jquery, or I would have posted code examples.

EDIT: I assume you are talking about fetching the csv on the client side.

Here Be Wolves 2009-05-25 13:12:16

It's a static webpage, so I dont know what ajax has to do with it? Seems overly complicated, I could extract it from the full HTML source if I knew how to return it?

Brendan 2009-05-25 13:32:53

Answer 3

A:

If this is just a one off script then exctracting this csv data is as simple as this:

import urllib2

response = urllib2.urlopen('http://www.bmreports.com/foo?bar?')
html = response.read()
csv = data.split('gs_csv=')[1].split('</SCRIPT>')[0]

#process csv data here

Randle Taylor 2009-05-25 14:02:46

Hi randle,I was looking at that method this morning, but this is from behind a company firewall/proxy with NTLM authentication. I tried several different ways and examples to get python working by proxy, but then gave up and thought it would be easier to script IE to get the document. From what I've read, Python and NTLM proxies dont play too well together.I assumed there should be some equivalent to innerhtml that returns the full html, so thought it would be quick and easy to do it this way...

Brendan 2009-05-25 14:09:03

@Brendan: You can use NTLMAPS to circumvent NTLM authentication in any aplication. It is written in python. http://ntlmaps.sourceforge.net/

nosklo 2009-05-25 16:26:51

nosklo, ntlmaps (as far as I can see) is a local proxy that routes through the NTLM Lan proxy, but has to be running all the time to field requests on localhost from another application. I could be wrong, but it's a bit awkward, and not very portable.

Brendan 2009-05-26 12:52:36

Answer 4

A:

Thanks to Sinan (this is mostly his solution transcribed into Python).

import win32com.client

import time import os

import os.path

ie = Dispatch("InternetExplorer.Application") ie.Visible=False

ie.Navigate("http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=&param2=&param3=&param4=&param5=2009-04-22&param6=37#")

time.sleep(20)

webpage=ie.document.body.innerHTML

s1=ie.document.scripts(1).text s1=s1[s1.find("gs_csv")+8:-11]

scriptfilepath="c:\FO Share\bmreports\script.txt"

scriptfile = open(scriptfilepath, 'wb')

scriptfile.write(s1.replace('\n','\n'))

scriptfile.close()

ie.quit

Brendan 2009-05-25 15:47:11

You should delete this answer and incorporate it in to your original post using proper markdown (the question mark next to the textbox tells you how to properly post code among other things). As for voting me up, it's no problem if you can't but AFAIK you should at least be able to mark the answer that solved your problem.

Sinan Ünür 2009-05-25 16:22:25

ansaurus

tags:

views:

answers:

Alternative to innerhtml that includes header?

related questions