tags:

views:

402

answers:

1

I tried to get support on this but I am TOTALLY confused.

Here's my code:


from twisted.internet import reactor
from twisted.web.client import getPage
from twisted.web.error import Error
from twisted.internet.defer import DeferredList
from sys import argv

class GrabPage:
 def __init__(self, page):
  self.page = page

 def start(self, *args):
  if args == ():
   # We apparently don't need authentication for this
   d1 = getPage(self.page)
  else:
   if len(args) == 2:
    # We have our login information
    d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
   else:
    raise Exception('Missing parameters')

  d1.addCallback(self.pageCallback)
  dl = DeferredList([d1])
  d1.addErrback(self.errorHandler)
  dl.addCallback(self.listCallback)

 def errorHandler(self,result):
  # Bad thingy!
  pass

 def pageCallback(self, result):
  return result

 def listCallback(self, result):
  print result

a = GrabPage('http://www.google.com')
data = a.start() # Not the HTML

I wish to get the HTML out which is given to pageCallback when start() is called. This has been a pita for me. Ty! And sorry for my sucky coding.

+3  A: 

You're missing the basics of how Twisted operates. It all revolves around the reactor, which you're never even running. Think of the reactor like this:

Reactor Loop

Until you start the reactor, by setting up deferreds all you're doing is chaining them with no events from which to fire.

I recommend you give the Twisted Intro by Dave Peticolas a read. It's quick and it really gives you all the missing information that the Twisted documentation doesn't.

Anyways, here is the most basic usage example of getPage as possible:

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
    print output
    if reactor.running:
       reactor.stop()

if __name__ == '__main__':
    print 'fetching', url
    d = getPage(url)
    d.addCallback(print_and_stop)
    reactor.run()

Since getPage returns a deferred, I'm adding the callback print_and_stop to the deferred chain. After that, I start the reactor. The reactor fires getPage, which then fires print_and_stop which prints the data from aol.com and then stops the reactor.

Edit to show a working example of OP's code:

class GrabPage:
    def __init__(self, page):
        self.page = page
        ########### I added this:
        self.data = None

    def start(self, *args):
        if args == ():
            # We apparently don't need authentication for this
            d1 = getPage(self.page)
        else:
            if len(args) == 2:
                # We have our login information
                d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
            else:
                raise Exception('Missing parameters')

        d1.addCallback(self.pageCallback)
        dl = DeferredList([d1])
        d1.addErrback(self.errorHandler)
        dl.addCallback(self.listCallback)

    def errorHandler(self,result):
        # Bad thingy!
        pass

    def pageCallback(self, result):
        ########### I added this, to hold the data:
        self.data = result
        return result

    def listCallback(self, result):
        print result
        # Added for effect:
        if reactor.running:
            reactor.stop()

a = GrabPage('http://google.com')
########### Just call it without assigning to data
#data = a.start() # Not the HTML
a.start()

########### I added this:
if not reactor.running:
    reactor.run()

########### Reference the data attribute from the class
data = a.data
print '------REACTOR STOPPED------'
print
########### First 100 characters of a.data:
print '------a.data[:100]------'
print data[:100] 
jathanism
The reactor was already run() in another file. This is an imported file. I guess I should mention that part. :x
Dave Dixon
I added a simple example for you, I hope that might help you visualize it.
jathanism
The format i don't think will work. I cannot have it print either. It needs to be a return. Because another import is going to mess with the result of this.
Dave Dixon
Sure, change it to return if you need to. I ran your code and it successfully returned data. One thing you will need to make sure that if you need to be collecting this data that you make the callback perform that action when it is fired. Such as by declaring a dictionary previous to starting your work, and then having `pageCallback` add a key to that dictionary. In other words, if you want `a.start()` to return the html data, you're going to have to make that method `return` something.
jathanism
Okay. I'm apparently the most blindest person on the planet because everyone says that they can get it work successfully.I don't know what I need to return and where to find the data that I need.
Dave Dixon
Ok there, I added a working example of your `GrabPage` class, with profuse commenting where I changed things.
jathanism