We have a set of Documents , each has a set of Features. Given feature A, we need to know what is the probability of having feature B in the same document.
I thought of building a probability matrix , s.t: M(i,j) = Probability of having feature B in a document , given that feature A is there.
However , we have an additional requirement: Given feature A is in the document , what are all the features that have a probability > P of being in the same document.
In the mean while all I could think off is a sparse matrix for the Probability matrix , and after it's computed , for each feature run over all the column , sort it by P , and keep it in a linked list somewhere. (So now , we have for each feature , a list of corresponding features
This space complexity is quite big (worst case: N^2, and N is large!) , and the time complexity for each search is O(N).
Any better idea?