I am working on a small project which involves a dictionary based text searching within a collection of documents. My dictionary has positive signal words (a.k.a good words) but in the document collection just finding a word does not guarantee a positive result as there may be negative words for example (not, not significant) that may be in the proximity of these positive words. I want to construct a matrix such that it contains the document number,positive word and its proximity to negative words.
Can anyone please suggest a way to do that. My project is at a very very early stage so I am giving a basic example of my text.
No significant drug interactions have been reported in studies of candesartan cilexetil given with other drugs such as glyburide, nifedipine, digoxin, warfarin, hydrochlorothiazide.
This is my example document in which candesartan cilexetil, glyburide, nifedipine, digoxin, warfarin, hydrochlorothiazide are my positive words and no significant is my negative word. I want to do a proximity (word based) mapping between my positive and nevative words.
Can anyone give some helpful pointers?