Hi All, i am using TF/IDF to calculate similarity. For example if i have following two doc.
Doc A => cat dog Doc B => dog sparrow
It is normal it's similarity would be 50% but when I calculate its TF/IDF. It is as follow
Tf values for Doc A dog tf = 0.5 cat tf = 0.5
Tf values for Doc B
dog tf = 0.5 sparrow tf = 0.5
IDF values for Doc A dog idf = -0.4055 cat idf = 0
IDF values for Doc B dog idf = -0.4055 ( without +1 formula 0.6931) sparrow idf = 0
TF/IDF value for Doc A 0.5x-0.4055 + 0.5x0 = -0.20275
TF/IDF values for Doc B 0.5x-0.4055 + 0.5x0 = -0.20275
Now it looks like there is -0.20275 similarity. Is it? Or am i missing something ? Or is any kind of next step too? Please tell me so i can calculate that too.
I used tf/idf formula which wikipedia mentioned