views:

45

answers:

1
+1  Q: 

Problem for lsi

I am using Latent semantic analysis for text similarity. I have 2 questions.

  1. How to select K value for dimention reduction?

  2. I read alot every where that LSI work for similary meaning words for example car and automobile. How is it possible??? What is the magic step I am missing here?

A: 
  1. try a couple of different values from [1..n] and see what works for whatever task you are trying to accomplish

  2. Make a word-word correlation matrix [ i.e. cell(i,j) holds the # of docs where (i,j) co-occur ] and use something like PCA on it

adi92