views:

125

answers:

2

Hello there,

I have a website where documents are saved in xml documents, all with the same structure.

I need a search engine where I am able to choose documents with the highest relevance according to the key words given by a searching user.

I thought it could (?) be a good idea to have one using XQuery rather than having the information stored twice (in XML docs + mysql database) and querying the mysql database for relevance searches.

Is XQuery any good for this, and how, and what speed can I expect on +1000 documents of about 7kb each.

Thank you for your time.

Kind regards

+1  A: 

If you have +1000 documents that are being searched given a query, it's not efficient using jQuery nor SQL databases.

1) Doing a sequential search through each document for every keyword will take you no less than # of documents * # of words inside each document * # of keywords

2) Each time you're doing a search, every document has to get scanned again. If you have a project that involves searching many times, this is not feasible.

3) A sequential search does not give you a way to rank your results based on how many words are found and the total number of words in a document, and the importance or each word, etc...

A better alternative is to use an Inverted Index data structure to 'index' your documents and words ahead of time.

This way, you'll do some work up front to index each word in each document, but you'll save a lot of time when doing the actual searching (which is what matters).

Another advantage is that you'll be able to rank documents in a non ad-hoc way. See the Vector Space model.

aduric
A: 

if you want a searching solution for the XML Documents ( only searching and not complex document transactions ) then i would suggest Apache - Lucene search engine.

Latest Apache Lucene 3.x version comes up with descent search features.

on top u can use Apache- Solr which is using lucene as search engine has all administrative features, faceted browsing and payloads. ( Note: Lucene implementation is available in all .NET, Java, Python, Ruby languages too ).

if you want some truely XQuery based solution and of open-source nature - considering your document volume try eXist Xml Database. load all your Xml Documents in eXists database and then use XQuery. But this approache requires -

  1. Ingest all your Xml Documents in to eXists database
  2. Write XQuery modules to query those documents in to an Xml result Set
  3. Talk to those XQuery modules directly from you App to get the results.
kadalamittai