Hi all,
I am testing Lucene.NET for our searching requirements, and I've got couple of questions.
We have documents in XML format. Every document contains multi-language text. The number of languages and the languages itself vary from document to document. See example below:
<document>This is a sample document, which is describing a <word lang="de">tisch</word>, a <word lang="en">table</word> and a <word lang="en">desk</word>.</document>
The keywords of a document are tagged with a special element and language attribute.
When I am creating lucene index I extract the text content from the XML and pairs of language and keyword (I am not sure if I have to), like this:
This is a sample document, which is describing a tisch, a table and a desk.
de - tisch
en - table
en - desk
I don't know exactly how to create an index that I will be able to search for example: - all the documents that contains word tisch in German (and not the document which contains word tisch in other languages).
And also I want to specifiy sorting at runtime: I want to sort by user specified language order (depending on a user interface). For example, if we have two documents:
<document>This is a sample document, which is describing a <word lang="de">tisch</word>.</document>
<document>This is a another sample document, which is describing a <word lang="en">table</word>.</document>
and a user on an English interface searches by "tisch OR table" I want to get the second result first.
Any information or advice is appreciated.
Many thanks!