tags:

views:

180

answers:

1

Hello!

I would like to calculate the term frequency using Tf-idf. I've drafted an equation where you should get the Tf-idf value on the left side. Is this correct?

Tf-idf for DOCUMENT:

tf-idf(WORD) = occurrences(WORD,DOCUMENT) / number-of-words(DOCUMENT) * log10 ( documents(ALL) / ( 1 + documents(WORD, ALL) ) )

  • occurrences(WORD,DOCUMENT): number of occurrences of WORD in DOCUMENT
  • number-of-words(DOCUMENT): number of words in DOCUMENT
  • documents(ALL): number of documents in the database
  • documents(WORD, ALL): number of documents in the database which contain WORD

It would be great if you could help me. Thank you very much in advance!

+1  A: 

According to the wikipedia article it is correct, you might want to change to 1+documents(WORD, ALL) instead of just documents(WORD, ALL) as the wikipedia article suggests.

TF-IDF on wikipedia

Tomh
Thank you! Now it should be completely correct!? I read the German Wikipedia article which the +1 wasn't mentioned in. So thank you for the good tip.