views:

51

answers:

1

I'd like to create a search engine for my photo sharing website. The search engine would just need to return results based on "tag" words. Photos would be sorted by popularity, newness, or a combination of the two.

I was curious whether I could just use the Yahoo BOSS api to accomplish this instead of setting up my own search engine (using lucene, solr, etc..).

I've looked at the documentation some, but haven't been able to figure out whether the BOSS api would let me import my entire index of results (instead of just searching what is already in the yahoo index) and then be able to update items in the search index with "tags" as users tag photos on the site.

Any other developers have experience doing something like this with Yahoo BOSS?

+1  A: 

As far as I am aware BOSS will let you search whatever the yahoo spider picks up in your site as it crawls. If all your content is browsable - i.e. discoverable - this may be sufficient for your purposes. It has the great advantage of requiring very little work.

I don't think you can upload or import content or indexes to BOSS/yahoo, so if your content cannot be found by crawling then BOSS may not be the solution.

If BOSS will not cut it, you need to implement your own search platform. You have basically two choices:

1) Use an index like Lucene. However, unless you have a LOT of content, option 2) may well be sufficient

2) Index the appropriate column(s) in your database. If you're using MySQL, take a look at Full text search

Option 2 is a lot less work than option 1. Both have the advantage over BOSS in that you can restrict your search to specified/desired fields. Implementing your own search also means that your results will always be up-to-date.

Hope that helps

Richard
There are 300,000 items to search. I imagine that a full text search of a "tags" field (with comma delimited tags) would be fairly slow. Correct?
makeee
I don't think 300k is a lot, if you're searching just the one tags field/column. Basically by indexing a column MySQL is doing pretty much what Lucene does, only internally. It should be fairly simple to test though, so prob worthwhile doing that before breaking out the big guns.
Richard