views:

133

answers:

1

hi, I recently discovered Scrapy which i find very efficient. However, I really don't see how to embed it in a larger project written in python. I would like to create a spider in the normal way but be able to launch it on a given url with a function

start_crawl(url)

which would launch the crawling process on a given domain and stop only when all the pages have been seen.

+2  A: 

Scrapy is much more complicated. It runs several processes and use multi-threating. So in fact there are no way to use it as normal python function. Of course you can import function that starts crawler and invoke it, but what then? You will have normal scrappy process, that has taken control of your program.

Probably the best approach here is to run scrappy as subprocess of your program and communicate with it using database or file. You have good separation between your program and crawler, and solid control over main process.

Tomasz Wysocki
Yes, at the minimum you could write interesting information to a file. Then check the file from the main app. Scrapy is definitely designed as standalone application that writes to a 'data store'.
nate c
See here http://gist.github.com/484009
Rho