views:

572

answers:

4

To the moment I know that compass may handle this work. But indexing with compass looks pretty expensive. Is there any lighter alternatives?

+4  A: 

Apache Lucene is the de-facto choice for full text indexing in Java. Looks like Compass Core contains "An implementation of Lucene Directory to store the index within a database (using Jdbc). It is separated from Compass code base and can be used with pure Lucene applications." plus tons of other stuff. You could try to separate just the Lucence component thereby stripping away several libs and making it more lightweight. Either that or ditch Compass altogether and use pure unadorned Lucene.

Asaph
Yep, probably I will go this way. My concern about using Lucene is that IO is very costly on App Engine and I hope that some body already produced optimized version of Lucene or home grown library which takes extremly high IO cost in account.
+1  A: 

For Google App Engine, the only indexing library I've seen is appengine-search, with a description of how to use it on this page. I haven't tried it out though.

I've used Lucene (which Compass is based on) and found it to work great with comparatively low expense. The indexing is a task that you can schedule at times that work for your app.

Some alternatives indexing projects are mentioned in this SO thread, including Xapian and minion. I haven't checked either of these out though, since Lucene did everything I needed it to very well.

Kaleb Brasee
+3  A: 

To be honest, I don't know if Lucene will be lighter than Compass in terms of indexing (why would it be, doesn't Compass use Lucene for that?).

Anyway, because you asked for alternatives, there is GAELucene. I'm quoting its announcement below:

Enlightened by the discussion "Can I run Lucene in google app engine?", I implemented a google datastore based Lucene component, GAELucene, which can help you to run search applications on google app engine.

The main clazz of GAELucene include:

  • GAEDirectory - a read only Directory based on google datastore.
  • GAEFile - stands for an index file, the file's byte content will be splited into multi GAEFileContent.
  • GAEFileContent - stands for a segment of index file.
  • GAECategory - the identifier of different indices.
  • GAEIndexInput - a memory-resident IndexInput? implementation like the RAMInputStream.
  • GAEIndexReader - wrapper for IndexReader? that cached in GAEIndexReaderPool
  • GAEIndexReaderPool - pool for GAEIndexReader

The following code snippet demonstrates the use of GAELucene do searching:

Query queryObject = parserQuery(request);
GAEIndexReaderPool readerPool = GAEIndexReaderPool.getInstance();
GAEIndexReader indexReader = readerPool.borrowReader(INDEX_CATEGORY_DEMO);
IndexSearcher searcher = newIndexSearcher(indexReader);
Hits hits = searcher.search(queryObject);
readerPool.returnReader(indexReader);

I warmly recommend to read the whole discussion on nabble, very informative.

Just in case, regarding Compass, Shay Banon wrote a blog entry detailing how to use Compass in App Engine here: http://www.kimchy.org/searchable-google-appengine-with-compass/

Pascal Thivent