views:

51

answers:

2

Hi all,

I have a number of text files. Each text files have data like this :

<text> Big data... big data... </text>
<text> another big data </text>
<text> some other data </text>

now I have to write a code with lucene that could retrieve the entire line when a search query matches,

like if i search for some data the entire third line should be filtered.

<text> some other data </text>


I've been able to do a little with spanQuery, but that returns me only documents and the word positions. how do i get the "real text" from the text file ?

Kindly give reference materials if available.

+1  A: 

Please see this question.

Yuval F
+1  A: 

I'm not sure what you mean. If it's always enough for you to retrieve only a single line, then you may want to create one Document per line instead of per file.

Then IndexReader.document will retrieve only the line in question. (Mapping back from lines to files will be more complicated, of course.)

larsmans
Awesome Idea.. Also, i dont think Mapping back from lines to files will be difficult. I'll have a field in the document which points to the file, That way it is made simple.. Please tell me about the performance.. Thanks..
echo
I did something similar where I indexed email subject headers only. For evaluation, I created sixty indexes of a few thousands Documents, then ran a few hundred queries per index. That took about half a minute *in total*, including loading time for the Java VM and several libraries. Performance depends on a lot of factors of course, so YMMV.
larsmans