ansaurus

Question

Python function based on Scrapy to crawl entirely a web site

Answer 1

+2 A:

Scrapy is much more complicated. It runs several processes and use multi-threating. So in fact there are no way to use it as normal python function. Of course you can import function that starts crawler and invoke it, but what then? You will have normal scrappy process, that has taken control of your program.

Probably the best approach here is to run scrappy as subprocess of your program and communicate with it using database or file. You have good separation between your program and crawler, and solid control over main process.

Tomasz Wysocki 2010-07-22 17:41:25

Yes, at the minimum you could write interesting information to a file. Then check the file from the main app. Scrapy is definitely designed as standalone application that writes to a 'data store'.

nate c 2010-07-23 03:45:16

See here http://gist.github.com/484009

Rho 2010-07-23 12:19:49

ansaurus

tags:

views:

answers:

Python function based on Scrapy to crawl entirely a web site

related questions