Hi everyone. This is more of a theory question rather than practice. I'm working on a project which is quite a simple catalog of links. The whole model is similar to the Dmoz or Yahoo catalog, except that each entry has certain additional attributes.
I have hierarchical taxonomy working on all entries with many-to-many relationship, all entries are now sorted into these categories, and everything seems to work fine. Now, what use is a catalog if there's no search option?
Here's a little bit more detail about my models: Each entry has a title, description, URL and several social profiles: YouTube, Twitter, Flickr and a couple of others. Each entry could have a logo attached to it, and a hidden field for tags. Also, the title and description are stored in three different languages. So basically I'd like the search results to be:
- Relevant (including taxonomy)
- Possibly ones with logos
- Possibly ones with 100% filled out profiles
I've tried Sphinx and currently working with Lucene, but it seems that I'm not getting the search right in theory. I hope it does make sense that filled entries should appear higher than the others, but I can't really figure out the scores. I wouldn't like irrelevant entries appear on top if there's simply one word match in the entire description, since titles are more relevant.
So my question is - are there any books, techniques or even other search engines (if Sphinx and Lucene are not good enough) that you would recommend for this matter? Not only I would like to get full control over search results and their ranking, but also give my visitors correct and relevant information.
Links on cool articles are appreciated too!
And No, I'm not trying to rebuild Google :)
Thanks :)