views:

84

answers:

1

Dear all,

I am using these technologies: SQL Server 2005, ASP.NET MVC, NHibernate/sharp architecture and would like to mine some text with the final aim of presenting some web based stats . I have several millions of keywords and several millions of documents and would like to run some queries based on these documents indexed by the keywords. I have played a bit with SQL Server’s full text indexing but I am not too impressed. So I am wondering whether Lucene.Net might be an alternative.

I have never used Lucene.Net but understand that it is a 1:1 port of the Java version. So my first question is whether it is worth studying the book ‘Lucene in action’ – provided that Lucene would be the right ‘technology’?

Thanks.

Best wishes,

Christian

+3  A: 

Well,

FIRST - update SQL Server. You use a two generations outdated version which had the first implementation of full text search in SQL Server and many (known and fixed) shortcomings.

Second - Lucene may really be better suited. SQL is primarily a database server, and the full text search does a lot of things, but also has a lot of limitations.

But entering Lucene DOES provide a significant complication - distributed transactions, backup handling turn a lot more complicated as they are two systems. SQL 2008 R2 does a much better job here (full text index stored in the database file).

That said, also be careful with performance. You may need a QUITE HIGH END SERVER if you want to run a lot of queries in parallel (which can happen easily with a web application). This may require multiple database servers running read only replications - something SQL Server does a lot easier than Lucene (as in: out of the box).

I suggest you just get Lucene and play with it ;) Not a lot more needed.

TomTom
Thanks. I read that sqls 2008 is much better but money is the problem. i could get hold of the developer edition but if things are web based the license would cost a lot. do you reckon things discussed in lucene in action (java) help me to deal with lucene.net? also this is research project so we do not expect tousands of 'customers' so as long as the queries produce results in a resonable amount of time i am happy.
csetzkorn
SQL Server 2008 web edition costs like 15 USD per processor wer month. Check SPLA licensing - "high price" mostly is "clueless about SPLA" in this regards ;)
TomTom
Sounds good thanks. can you answer the book question please? Is the ‘Lucene in action’ book useful or are there other sufficient documentations? The lucene.net site does not seem to contain much docu ...
csetzkorn
No idea. Last time we worked iwth Lucene.NET we just worked with the documentation there. Took only two days to get it working, so - the book was not deemed necessary. I normally dont like any books - most are basically rewritten documentations anyway.
TomTom