Hello,
I was wondering if anyone ever tried to extract/follow RSS item links using SgmlLinkExtractor/CrawlSpider. I can't get it to work...
I am using the following rule:
rules = ( Rule(SgmlLinkExtractor(tags=('link',), attrs=False), follow=True, callback='parse_article'), )
(having in mind that rss links are located in the link tag).
I am not sure how to tell SgmlLinkExtractor to extract the text() of the link and not to search the attributes ...
Any help is welcome, Thanks in advance