views:

90

answers:

2

I have a news site with 150,000 news articles. About 250 new articles are added daily to the database at an interval of 5-15 minutes. I understand that Solr is optimized for millions of records and my 150K won't be a problem for it. But I am worried the frequent updation will be a problem, since the cache gets invalidated with every update. In my dev server, cold load of a page takes 5-7 seconds to load (since every page runs a few MLT queries).

Will it help, if I split my index into two - An archive index and a latest index. The archive index will be updated once every day.

Can anyone suggest any ways to optimize my installation for a constantly updating index?

Thanks

+1  A: 

My answer is: test it! Don't try to optimize yet if you don't know how it performs. Like you said, 150K is not a lot, it should be quick to build an index of that size for your tests. After that, run a couple of MLT queries from a different concurrent threads (to simulate users) while you index more documents to see how it behaves.

One setting that you should keep an eye on is auto-commit. Since you are indexing constantly, you can't commit at each document (you will bring Solr down). The value that you will choose for this setting will let you tune the latency of the system (how many times it takes for new documents to be returned in results) while keeping the system responsive.

Pascal Dimassimo
I like the idea of tuning your COMMIT interval. You should be able to keep adding documents all the time and just COMMIT at a regular intervals. Then, you are only paying the re-cache once per interval.
Aaron D
A: 

Consider using mlt=true in the main query instead of issuing per-result MoreLikeThis queries. You'll save the roundtrips and so it will be faster.

Mauricio Scheffer