Enterprise Search: Has anybody developed on FAST ESP? What did you think about it?

views:

3031

answers:

+5 Q:

Enterprise Search: Has anybody developed on FAST ESP? What did you think about it?

I work for a scandinavian yellow pages. The company is looking at moving its bespoke search technology over to FAST ESP.

Like all big, expensive systems with relatively few installations, it is difficult to get feedback on the strengths and weaknesses of the system.

Are there any stackoverflowers who have experience of FAST ESP and want to share?

+11 A:

Hello Fergie. :) I am a search architect that has been developing and integrating search engine technology since 1997 from my days as a Lycos software engineer.

We use FAST ESP as the search engine that powers http://thomasnet.com. I've been working with ESP since 2003 (then known as FDS 3.2).

FAST ESP is extremely flexible and can deal with indexing many document types (html, pdf, word, etc). It has a very robust crawler for web documents and you can use their intermediary FastXML format to load custom document formats into the system or use their Content APIs.

One of my favorite parts of the engine is its Document Processing Pipeline which lets you make use of dozens of out-of-the-box processing plugins as well as using a Python API to write your own custom document processing stages. An example of a custom stage we wrote was one that looks at a website URL and tries to identify which company it belongs to so additional metadata can be attached to a web document.

It has a very robust programming/integration SDK in several popular languages (C++/C#/Java) for adding content and performing queries as well as fetching system status and managing cluster services.

ESP has a query language called FAST Query Language (FQL) that is very robust and allows you to do basic Boolean searches (AND, OR, NOT) as well as phrase and term proximity searches. In addition to that, it has something called "scope search" which can be used to search document metadata (XML) that has a format that can vary from document to document.

In terms of performance, it scales fairly linearly. If you benchmark it to determine how it performs on one machine, if you add another machine it generally can double performance. You can run the system on one machine (only recommended for development), or many (for production). It is fault-tolerant (it can still serve some results if one of your load-balanced indices goes offline) and it has full fail-over support (one or more critical machines could die or be taken offline for maintenance and the system will continue to function properly)

So, its very powerful. The documentation nowadays is pretty good. So, you ask, what are the downsides?

Well, if the data you need to make searchable has a format that changes frequently, that might be a pain. ESP has something called an "Index Profile" which is basically a config file it uses to determine what document fields are important and should be used for indexing. Everything fed into ESP is a "document", even if your loading database table rows into it. Each document has several fields, typical fields being: title, body, keywords, headers, documentvectors, processingtime, etc. You can specify as many of your own custom fields as you wish.

If your content maintains mostly the same format (like web documents) its not a big issue. But if you have to make big changes to which fields should be indexed and how they should be treated, you probably need to edit the Index Profile. Some changes to the index profile are "Hot Updates", meaning you can make the change and not interrupt service. But, some of the bigger changes are "Cold Updates" which requires a full data refeed and indexing before the change takes effect. Depending on the size of your dataset and how many machines are in your cluster, this operation could take hours or days. Cold Updates are a pain to schedule unless you have plenty of cash for extra hardware that you can bring online while your production systems are performing a cold update and reloading the data. Having to do that on production clusters more than once or twice a year requires a fair amount of planning to get right with minimum or 0% downtime.

For your case, I doubt your data formats will change very frequently. If you need to make minor tweaks to it, you can add additional metadata to scope fields to side-step the need to do any full data reloads.

Most of the trouble you'll probably encounter is the initial learning curve of using the product. Once you get a development cluster (or node) doing what you want and if don't have to make significant changes to indexed field configs frequently, it is a very very stable and dependable search engine to use. For your application it sounds like a good match, for smaller companies or startups there are open-source options out there that are not as expensive up front that should suffice if you don't need as much performance or durability.

I hope you find this assessment helpful. :)

Sincerely, Michael McIntosh Senior Search Architect @ TnR Global

2009-02-10 04:02:42

+1 A:

The FAST ESP technology is solid, but you will want to bear in mind that it is really a search platform (hence "ESP") not an out-of-the-box search experience. The quality of your results are directly related to the quality of your index, which means you really need to tune your document processing pipeline and index profile for your content.

There are no hard and fast rules for this; you really need to understand the platform and your content. It does take time and a lot of trial and error. Also, it is resource hungry so you cannot skimp on hardware. If you have the time and resources to do it right it will work great, but a halfway job will be no better (and possibly worse) than something out of the box or even Lucene.

AndyM 2009-02-12 19:33:24

I would highly recommend Lucene for any search applications. It is powerful, customisable, has an active community and is in widespread use for projects of all sorts of sizes - bot commercial and non. Don't be put off the lack of documentation, so some digging and you will be rewarded. The project has been going almost 10 years now (I think) and gives any commercial offering more than a run for their money.

Joel 2009-05-14 12:15:27

I'm in the process of implementing FAST ESP for a few corporate intranet sites (large company). I've worked a little with search technology (Verity back in the late 90s).

Luckily, I took the FAST ESP developer courses before we really got started. The courses were really easy and if you're a quick study, you can probably just do the online classes. The biggest benefit in these for me was getting a heads-up on the API before the project started. After a quick look and a few programming labs using the API, I realized there was quite a bit that I would have to code.

I'm mostly disappointed in the API. FAST ESP was just purchased by MS less than a year ago, so hopefully, they'll get some help in cleaning up the .NET API. The .NET API fells like someone just clicked a button and made a COM wrapper to interface with the native Java servlets. The API naming conventions and methods are easy enough to orient yourself to (as long as you remember that all FAST ESP collections/arrays are 1-based instead of 0-based). However, I believe they could do a lot of work here. The Java API looked pretty much like all of the other Java APIs that I've seen and worked with. The naming conventions and structure looks like a standard Java API, probably because FAST ESP is a java-based search engine and their developers are Java software engineers and not .NET software engineers.

At first, since I was using ASP.NET, I developed a set of web controls that mimic the MS SharePoint web controls functionality. In the classroom and all ASP.NET examples, everything was inline ASP.NET coding with no or very little "code-behind" coding. Yahoo! Developer Network has some nice design patterns for designing search interfaces, results, pagers, etc.

Overall and so far, it works pretty well. We're still in development phase and are going to start beta testing our site within the next few weeks. The FQL (Fast Query Language) is a bit over complicated - our users will probably complain that the language isn't "Google-like" enough for them. If you search for some FQL pdf files, you'll be able to preview the language. You can also just use simple searches (all terms, any terms, etc.).

If there's anything specific you'd like to know, just ask and I'll try to get the information. We're using FAST ESP in a VM environment - which they say isn't supported, but its working fine and the benchmark results are okay for us.

2009-03-05 23:58:34

FAST ESP is good. At least when compared to Google Search Appliance . But then, which enterprise search engine to choose is entirely upto the requirment.

2009-04-30 22:50:12

does anyone know how to integrate .NET front end with FAST ESP backend I found couple of good answers at http://fastesphelp.com but looking for more. please help

2009-05-14 12:03:17

@anand, You can use the FAST ESP .NET API. There's PDF documents, sample code, and API reference material with the install.

2009-06-02 20:53:25

@anand: You can choose between the .NET Content API or do everything via HTTP/XML and style the XML as you wish.

danglund 2009-07-08 11:10:57

@Michael McIntosh: To avoid a cold update you could add generic fields to the index. For example you add 5 generic integers, 5 strings and 5 dates. When you need to suddenly introduce a new integer you can use the "padding" you already have, for example igeneric1.

After a while you may want to do a cold update and then you consolidate these fields and give them proper names etc.

danglund 2009-07-08 11:15:21

+1 A:

I've been supporting FAST ESP for some years now. Overall 4/5.

My experience to date is that the FAST ESP platform is rock solid, but the various connectors have some quirks.

The Lotus Notes connector is especially poor, and periodically breaks when more than 100,000 documents are indexed.

Other quirks can be fairly major, such as the File Traverser not reflecting a document update when a files’ NTFS permissions are updated. It means everybody can see a document they shouldn’t be able to – bad security problem.

I echo the sentiments of others here, FAST ESP is very good, but it’s certainly not an ‘out-of-box’ solution. Expect to invest a good 3 to 12 months implementing – but you’ll be rewarded with a very powerful engine.

ben 2009-08-24 03:32:03

+1 A:

We've implemented a large number of FAST ESP applications and on all occasions ESP has proved to be a very stable, high performance platform, as long as you invest in the relatively higher implementation costs upfront. With regard to the yellowpages question - we implemented and manage the largest online directories site in the US using ESP and it handles huge QPS (queries per second). As mentioned by others, the key alternative technologies - Google, Solr/Lucene for e.g are also very capable and your choice really depends on tech/user requirements and budget.

Gary Holloway 2010-04-01 13:02:56

Hi All,

I am trying to write Java feeding client using the FAST ESP java APIs. Can someone explain the approach. And I also need a detailed description of the APIs.

Thanks, Manohar Negi

Manohar 2010-07-16 11:46:52

During 2008-2009 I had a job in russian yellow pages (yell.ru) as "Search Engine Engineer". My primary responsibility was to work with FAST ESP system. I write and maintain custom document processor (custom stage) for our specific data processing some "glue" code for data pushing pipeline. In regards of FAST ESP. I got a "mixed" feeling about it. Here is some downsides.

It is an expensive product. Aside one-time initial payment you must pay annual (and notiable) license fee or your server will stop work. Our fault was to arquire (relative) low-cost license that has a very limited "max request per second" rate (10 query-per-second maximum). While we was told several times it is just "bussiness limitation", actually it was a hard technical limit of server's peak throughtput. Our performance was ruined having this peak limit and we switched back to temporal "evaluation" license that (surprisely!) had no any performance limitation (just a time period limit).
Documentation is good but not very deep in technical details. It is impossible to do something really tricky just by reading documentation. Details are simple not here. Once we were told we need to contact to their "solution department" (and buy "solution") because it not meant to be done by customers.
Some parts are surprisily tricky and buggy. Some examples: while putting custom dictionaries where are several problems with non-english symbols. Sometimes system became slow and unresponsible if we load it with a bunch of phrases with custom boost values.
There are some strange technical limits here and there. For example - we can have only 8 different boost values assigned to searchable fields.

In a general - we had a tought time trying to follow our user's needs having FAST ESP as underlying search engline for our site. Finally the system was replaced with another (open source) solution and I was fired ;-) The end of story.

Vladimir Ignatov 2010-09-07 10:09:00

Any material on writing advanced document processing plugin? E.g. doing custom information extraction from the content? I've heard its done in Python but seems there is no material out there to learn how to actually do it.

Ravish Bhagdev 2010-09-21 15:13:12

ansaurus

tags:

views:

answers:

Enterprise Search: Has anybody developed on FAST ESP? What did you think about it?

related questions