views:

595

answers:

2

We are working on websites for our clients and we want to adopt a search solution that can be easily reused. Which one should we go after ? Should we use Google Search API or should we use MS Sql Server Full Text Indexing and the CONTAINS & FREETEXT predicates ?

+3  A: 

The good thing about SQL Server full text searching is the barrier to entry is quite low (assuming you're already using SQL Server). StackOverflow uses it for it's search. The downside is that it's effectiveness (or lack thereof) is one of the most frequently criticized features of SO. So much so that a lot of people (myself included) default to using "site:stackoverflow.com ..." in Google.

Google Custom Search also has a low barrier to entry but you lose some control on how often your index is updated and how many search results you can return. Google Site Search is a better version that corrects some of these features (like on-demand indexing).

At the top end you have Google Search Appliances, which is really your only Google option if your data isn't public.

Which is appropriate depends on how often your data needs to be re-indexed, how many requests you make, how much bandwidth you want to use getting indexed, whether your data is public and how good you need the search results to be. There is no one answer.

cletus
paragraph 2 of my response is my answer to your first paragraph
Jeff Atwood
+1. especially the site:stackoverflow.com bit ! ;)
Mitch Wheat
I appreciate the limitations of DB searching vs Web indexing. Rightly or wrongly though, people *do* expect Google-like search results. That's just one of the tradeoffs you have to make one way or the other.
cletus
+4  A: 

We use SQL Server full text indexing here on Stack Overflow and it works reasonably well -- but I can only recommend it for 2005 and 2008, the versions we use it on. I heard it's much worse in 2000. There are quirks (stopword lists, etc) but nothing serious. It's fast and does what it says on the tin, mostly.

The problem you run into with contains() and freetext() is that users often expect to search at the "whole page" level, ala Google, where anything that's written to the page / screen is searchable. That's not really how databases work, but users don't care about that. They care about results and have (arguably reasonable) expectations based on years of web searching.

If you expect to need the "whole page" search level, I'd strongly recommend looking at the Google Search API, or Lucene.NET (assuming you're a Microsoft stack based on use of SQL).

Jeff Atwood
But as much as I love SQL Server, I find the effectiveness of SO's search quite poor. A site based search ala Google is far more effective...
Mitch Wheat
you just have to know how to use it, and what it's good at vs. what google is good at. try [tag] "unique phrase" and I can pretty much guarantee you'll find what you want. For fuzzier "Gee I dunno sounds something like this" there's no way ANYONE can beat Google.. ever.
Jeff Atwood
for example, try searching for this. "stack overflow" "full text indexing".
Jeff Atwood
Why is SO combining the keywords via OR? very confusing to me... and the ranking of the result is not very useful too, maybe because the SQL Server full text ranking is not very useful
Peter Gfader