tags:

views:

44

answers:

2

Hi

I have a problem that in my lucene index files one document can have huge text. now when i search one of these huge text documents lucene/solr does not filter any results even the search term exist in the document text. the reason that i think might be the large number of characters in document text? if yes than how could we tell solr/lucene how much characters to analyze during search, please explain

I am using Solr 1.4.1 can any

Thanks Ahsan

+2  A: 

Lucene can handle huge documents without trouble. It seems unlikely that the document size itself is the problem. Use a tool like Luke to inspect the index and see what terms are associated with some of these large documents.

erickson
I have do the same but every thing looks ok. but problem still persists i don't know that can i post my index files here..
Ahsan Iqbal
+1  A: 

Also, have you changed the maxFieldLength setting in solrconfig.xml? I am testing out indexing the Bible, at 25 MB of data, and with a maxFieldLength of 10,000, which is the default, only the first 10,000 tokens ever get analysized, which leads to roughly ~2000 unique terms for my document.

If you are using Lucene directly, then there are a couple setting for maxFieldLength, you may have "unlimited" and therefore getting everything. Check the JavaDocs for how to set maxFieldLength.

Eric Pugh
i want to know that changing maxFieldLengh in solrConfig at searching time works are i need to do that at indexing time too?
Ahsan Iqbal
It is a index time parameter. If you already have a ginormous document in the index, changing this won't change anything retroactively.
Eric Pugh
thanx for your answer
Ahsan Iqbal