views:

55

answers:

2

I have 120k db records to commit into a Solr index.

My question is: should I commit after submitting every 10k records, or only commit once after submitting all the 120k records?

Is there any difference between these two options?

+1  A: 

According to the Lucene 2.9.3 documentation, commit() allows readers to see the added documents and puts all added/deleted documents on the index in the disk. It is a costly operation.

So if you want to see part of the documents while adding others, or want an assurance that you will not lose an added set of documents larger than 10,000 documents, you need to commit every 10,000 records.

OTOH, If you prefer to save the extra commits time, and are not afraid to lose documents if the machine fails, commit only after all of the documents were added.

Yuval F
@Yuval F if i commit all the records at last,is this will cost many memory,i didn't know the detail of lucene commit
mlzboy
@mizboy I am not sure that it will cost any memory. I believe you already pay the memory price when adding the documents, because they are added to the index inside memory. You probably need to benchmark this and decide.
Yuval F
+1  A: 

Use Solr's default auto-commit values, which I believe are quite reasonable. If not, you can adjust them to suit your needs:

<!-- autocommit pending docs if certain criteria are met.  Future versions may expand the available
 criteria -->
<autoCommit>
  <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit triggered -->
  <maxTime>50000</maxTime> <!-- maximum time (in MS) after adding a doc before an autocommit is triggered -->
</autoCommit>

This means that it will commit when there are more than 10000 docs waiting to be committed, or 50s have passed since a document was added.

dogbane