Hi Guys,
I am trying to use Lucene (actually PyLucene!) to find out how many documents contain my exact phrase. My code currently looks like this... but it runs rather slow. Does anyone know a faster way to return document counts?
phraseList = ["some phrase 1", "some phrase 2"] #etc, a list of phrases...
countsearcher = IndexSearcher(SimpleFSDirectory(File(STORE_DIR)), True)
analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)
for phrase in phraseList:
query = QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse("\"" + phrase + "\"")
scoreDocs = countsearcher.search(query, 200).scoreDocs
print "count is: " + str(len(scoreDocs))