views:

104

answers:

1

Hi, I am using PySolr to run my search. I want to index an rss feed and was wondering if this is possible using PySolr and if so how do you do it.

I have found instructions on how to do this in Solr at http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_Example

but can't find anything on how to do the equivalent in PySolr

Thanks

+1  A: 

You probably don't need to do the equivalent in PySolr. If you already have Solr indexing the feed, as per the example, then you just use PySolr to query that index. Something like:

from pysolr import Solr solr = Solr('http://localhost:8983/solr/rss/') response = solr.search('some query string') print response.hits for result in response.docs: do_stuff_with(result)

If you really want to do it from the Python side, then you'll need to fetch and parse the RSS there (using other libraries, e.g. Universal Feed Parser); PySolr just wraps the interaction with Solr, it doesn't “do” data sources.

You may want to check out Haystack, which uses PySolr (and can use other engines) and neatly abstracts the job of creating search index entries and shipping them off to Solr for indexing.

Gunnlaugur Briem
Yeah this is the way I have gone but would have just prefered to do it all in Pysolr instead of going between the two.
John
Okay then, edited to add some words on what to do on the Python side.
Gunnlaugur Briem