Hi,
I am trying to index a table in a database using Lucene. I use Lucene just for indexing, the Fields are not stored. The table mentioned above has five columns (userid (PK), description, report number, reporttype, report).
I intend to use a combination of userid, reportnumber and report type for getting data back from the database, if Lucene finds a hit.
One record in the table can span multiple rows for e.g.
JQ123, SOMEDESCRIPTION, 1, FIN, content of fin report
JQ123, AnotherDescription, 2, MATH, content of math report
JQ123, YetAnotherDesc, 3, MATH, content of another math report
JD456, MoreDesc, 1, STAT, content of stat report ..so on
Some of the report types e.g. (MATH) have highly structured contents (XML, stored as string in last column) and in the future I may want to flesh out some of the content as a Field of the document.
My strategy so far has been to create a Lucene Document for every row and index it. My thinking behind it being that 1. It is easy and seems logical (to me) 2. if I end up extracting contents out of certain document types and making them in to Fields, all that would be needed is an if statement that checks for report type and creates these new Fields. Here is the relevant code:
public void createDocument(){
Document luceneDocument=new Document();
luceneDocument.add(new Field("userid", userID, Field.Store.NO, Field.Index.NOT_ANALYZED));
luceneDocument.add(new Field("reportnumber", reportNum, Field.Store.NO, Field.Index.NOT_ANALYZED));
luceneDocument.add(new Field("reporttype", reportType, Field.Store.NO, Field.Index.NOT_ANALYZED));
luceneDocument.add(new Field("description", description, Field.Store.NO, Field.Index.ANALYZED));
luceneDocument.add(new Field("report", report, Field.Store.NO, Field.Index.ANALYZED));
if(reporttype.equalsIgnoreCase("MATH"){
luceneDocument.add(new Field("more fields", field content, Field.Store.NO, Field.Index.ANALYZED));
}
indexwriter.add(luceneDocument)
indexwriter.close
}
1. Does having different Documents for the same record affect Lucene's search efficiency in any fashion?
2. Would this approach have any significant disk space over heads when compared to having one Document per record in Lucene (I do not store any Fields)?
Thanks in advance for your response,