views:

66

answers:

1

Hi, I'm very new to Scrapy. Here my spider to crawl twistedweb.

class TwistedWebSpider(BaseSpider):

name = "twistedweb3"
allowed_domains = ["twistedmatrix.com"]
start_urls = [
    "http://twistedmatrix.com/documents/current/web/howto/",
]
rules = (
    Rule(SgmlLinkExtractor(),
        'parse',
        follow=True,
    ),
)
def parse(self, response):
    print response.url
    filename = response.url.split("/")[-1]
    filename = filename or "index.html"
    open(filename, 'wb').write(response.body)

When I run scrapy-ctl.py crawl twistedweb3 It fetched index.html only.
Getting the index.html content and tried using SgmlLinkExtractor, it extract links as I expected but these links can not be followed.

Can you show me the wrong?

Suppose I want to get css, javascript file. How do I achieve this? I mean get full website?

A: 

rules attribute belongs to CrawlSpider.Use class MySpider(CrawlSpider). Also, when you use CrawlSpider you must not override parse method, instead use parse_response or other similar name.

Rho
Thank Rho. You save me a day.It works after modifying as you suggested
Iapilgrim