views:

689

answers:

5

The question wasn't clear enough, I think; here's an updated straight to the point question:

What are the common architectures used in building a meta search engine and is there any libraries available to build that type of search engine?

I'm looking at building an "enterprise" type of search engine where the indexed data could be coming from proprietary (like Autonomy or a Google Box) or public search engines (like Google Web or Yahoo Web).

+1  A: 

This page seems to list a few:

http://java-source.net/open-source/search-engines

I'd imagine the APIs will all be a similar in that they take a query string and some options, and return a collection of results. However, the exact types of the options and results are likely to be different, so I'd have thought that you'd need some sort of Adapter approach (for example) to unify access to the different backends.

pdbartlett
+2  A: 

Not exactly what you are looking for but I'd still suggest to check Compass, it might give you some ideas. And maybe also Hibernate Search.

Update: To clarify, Compass is not an ORM (neither Hibernate Search), it's a search oriented API and because it tries to abstract the underlying search engine (Lucene), I was suggesting to have a look at some structures it uses: Analyzers, Analyzer Filter, Query Parser, etc.

Building on top of Lucene, Compass simplifies common usage patterns of Lucene such as google-style search (...)

See also:

Pascal Thivent
+3  A: 

Have a look at Lucene.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

bobah
Can Lucene/Solr/Nutch handle meta-searching (or federated searching)?
Loki
Not directly. But Lucene's indexing capabilities are awesome, especially incremental index construction and merging multiple indexes. Feature list is http://lucene.apache.org/java/docs/features.html
bobah
+4  A: 

If you look at Garlic (pdf), you'll notice that its architecture is generic enough and can be adapted to a meta-search engine.

UPDATE:

The rough architectural sketch is something like this:

   +---------------------------+
   |                           |
   |    Meta-Search Engine     |         +---------------+
   |                           |         |               |
   |   +-------------------+   |---------| Configuration |
   |   | Query Processor   |   |         |               |
   |   |                   |   |         +---------------+
   |   +-------------------+   |
   +-------------+-------------+
                 |
      +----------+---------------+
   +--+----------+-------------+ |
   |             |             | |
   |     +-------+-------+     | |
   |     |    Wrapper    |     | |
   |     |               |     | |
   |     +-------+-------+     | |
   |             |             | |
   |             |             | |
   |     +-------+--------+    | |
   |     |                |    | |
   |     | Search Engine  |    | |
   |     |                |    +-+
   |     +----------------+    |
   +---------------------------+

The parts depicted are:

  • Meta-Search Engine - the engine, orchestrates the whole thing.
  • Query Processor - part of the engine, resolves capabilities, sends requests and aggregates results of specific search engines (through the wrappers).
  • Wrapper - bridges the meta-search engine API to specific search engines. Each wrapper works with a specific search engine. Exposes the extenal search engine capabilities to the meta-search engine, accepts and responds to search requests.
  • Search engine - external search engines to query, they're exposed to the meta-search engine through the wrappers.
  • Configuration - data that configures the meta-search engine, e.g., which wrappers to use, where to find more wrappers, etc. Can also configure the wrappers.
Jordão
Eh, be careful when linking to a PDF!
Loki
Thanks Loki, I added an indication....
Jordão
+1 : I am currently working on this and this is pretty much what I ended up with.I have a Meta-Query, and the Wrapper translates the query into the format of the actual search-engine. the wrapper then translates the answer to the Meta-Result and here you go...
Stephane
+1  A: 

If you can read Objective-C and want to see a working example of something like a "meta-search engine" you might want to take a look at the source code for Google's Vermilion framework. It use the engine that backs the very popular Google Quick Search Box utility for OS X (which in turn is a lot like QuickSilver.

The framework provides the capability to add plugin backends for the search process and deals with merge sorting the results from a number of sources etc. I would imagine the design for a federated search engine of any sort would follow a similar design.

jkp