views:

126

answers:

1

I have been trying to get a simple spider to run with scrapy, but keep getting the error:

Could not find spider for domain:stackexchange.com

when I run the code with the expression scrapy-ctl.py crawl stackexchange.com. The spider is as follow:

from scrapy.spider import BaseSpider
from __future__ import absolute_import


class StackExchangeSpider(BaseSpider):
    domain_name = "stackexchange.com"
    start_urls = [
        "http://www.stackexchange.com/",
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)

SPIDER = StackExchangeSpider()`

Another person posted almost the exact same problem months ago but did not say how they fixed it, http://stackoverflow.com/questions/1806990/scrapy-spider-is-not-working I have been following the turtorial exactly at http://doc.scrapy.org/intro/tutorial.html, and cannot figure out why it is not working.

When I run this code in eclipse I get the error

Traceback (most recent call last): File "D:\Python Documents\dmoz\stackexchange\stackexchange\spiders\stackexchange_spider.py", line 1, in <module> from scrapy.spider import BaseSpider ImportError: No module named scrapy.spider

I cannot figure out why it is not finding the base Spider module. Does my spider have to be saved in the scripts directory?

+1  A: 

try running python yourproject/spiders/domain.py to see if there are any syntax error. I don't think you should enable absolute import as scrapy relies on relatives imports.

Rho
It says it cannot find tbe scrapy.spider module
Nacari
ya, first issue says `Could not find spider for domain:stackexchange.com` which is a scrapy message, therefore scrapy module loads correctly. And the latter issue is related to eclipse and pythonpath.
Rho
Problem fixed. Reinstalled on another computer. Must have had files misplaced or it installed wrong.
Nacari