Hello
I would like to store data retrieved hourly from RSS feeds in a database or in Lucene so that the text can be easily indexed for wordcounts.
I need to get the text from the title and description elements of RSS items.
Ideally, for each hourly retrieval from a given feed, I would add a row to a table in a dataset made up of the following columns:
feed_url, title_element_text, description_element_text, polling_date_time
From this, I can look up any element in a feed and calculate keyword counts based upon the length of time required.
This can be done as a database table and hashmaps used to calculate counts. But can I do this in Lucene to this degree of granularity at all? If so, would each feed form a Lucene document or would each 'row' from the database table form one?
Can anyone advise?
Thanks
Martin O'Shea.