tf-idf: Does using it help to weigh documents that share the terms higher than a document that doesnt?

Hi.

I'm working on a customized search feature for a website. and I was curious if using only tf-idf to rank documents in my corpus would also help to weigh documents that have multiple search terms higher than documents with only one search term.

Example: Search = "poland spring water" Theoretically, would the above query weigh (using traditional tf-idf) a document higher if the a document contained "poland" 100 times and "water" zero times. Or would it weigh a document heavier if it contained "poland" 10 times and "water" 10 times.

I'm aware that it all depends on the tf-idf value of "poland" and "water" but theoretically on an even playing field, would the algorithm help bring documents to the top of the results more if there were multiple terms in the document, or is it really term independent?

ansaurus

tags:

views:

answers:

tf-idf: Does using it help to weigh documents that share the terms higher than a document that doesnt?

related questions