I use Lucene.net to index content and documents etc.. on our CMS. This has worked well so far, but now I've got to take account of the following additions to web pages:
- Publish date
- Expiry date
- Page 'is active'
- User authorisation
So the search results should only show pages that are within the Publish / Expiry window, are 'active' and that the current user is authorised to view.
Should I include the above information in the Lucene index? It will make the queries a little more complicated, but the hits collection will only return 'valid' documents which will make paging the results a lot easier.
On the other hand, I'll be repeating information that is already in the CMS database so I'll be risking the integrity of my data, and I'll have update the index whenever anything in the above list is changed as well as the actual content itself.
Anyone else had this problem? How did you solve it? Thanks.
Edit: I may need to use a 'FieldCache' (mentioned here) to pass the 'valid' doc ids into the lucene search?