views:

654

answers:

3

I am wondering if anyone has any thoughts on the best way to perform keyword searches on Amazon SimpleDB from an EC2 Asp.Net application.

A couple options I am considering are:

1) Add keywords to a multi-value attribute and search with a query like: select id from keywordTable where keyword ='firstword' intersection keyword='secondword' intersection keyword = 'thirdword'

Amazon Query Example

2) Create a webservice frontend to Katta:

Katta on EC2

3) A queued Lucene.Net update service that periodically pushes the Lucene index to the cloud. (to get around the 'locking' issue)

Load balance Lucene(StackOverflow post)

Lucene on S3 (blog post)

+1  A: 

If you are looking for a strictly SimpleDB solution (as per the question as stated) Katta and Lucene won't help you. If you are looking for merely an 'Amazon infrastructure' based solution then any of the choices will work.

All three options differ in terms of how much setup and management you'll have to do and deciding which is best depends on your actual requirements.

SimpleDB with a multi-valued attribute named Keyword is your best choice if you need simplicity and minimum administration. And if you don't need to sort by relevance. There is nothing to set up or administer and you'll only be charged for your actual cpu & bandwidth.

Lucene is a great choice if you need more than keyword searching but you'll have manage updates to the index yourself. You'll also have to manage the load balancing, backups and fail over that you would have gotten with SimpleDB. If you don't care about fail over and can tolerate down time while you do a restore in the event of EC2 crash then that's one less thing to worry about and one less reason to prefer SimpleDB.

With Katta on EC2 you'd be managing everything yourself. You'd have the most flexibility and the most work to do.

Mocky
thanks Mocky, I am looking more and more at not implementing a strickly SimpleDB solution. It is too limited. It is suprising that not many full-text search solutions are available on AWS. Especially when the website is on elastic load balancing.
josefresno
+1  A: 

Just to tidy up this question... We wound up using Lightspeed's SimpleDB provider, Solr and SolrNet by writing a custom search provider for Lightspeed.

Info on implementing ISearchEngine interface for Lightspeed: http://www.mindscape.co.nz/blog/index.php/2009/02/25/lightspeed-writing-a-custom-search-engine/

And this is the Solr Library we are using: http://code.google.com/p/solrnet/

Since Solr can be easily scaled using EC2 machines, this made the most sense to us.

josefresno
A: 

Simple Savant is an open-source .NET persistence library for SimpleDB which includes integrated support for full-text search using Lucene.NET (I'm the Simple Savant creator).

The full-text indexing approach is described here.

Ashley Tate