Does anyone know of a good way to calculate the "semantic distance" between two words?
Immediately an algorithm that counts the steps between words in a thesaurus springs to mind.
Does anyone know of a good way to calculate the "semantic distance" between two words?
Immediately an algorithm that counts the steps between words in a thesaurus springs to mind.
OK, looks like a similar question has already been answered: http://stackoverflow.com/questions/62328/.
Possible hack: send the two words to Google search, and return the # of pages found.
The thesaurus idea has some merit. One idea would be to create a graph based on a thesaurus with the nodes being the words and an edge indicating that there they are listed as synonyms in the thesaurus. You could then use a shortest path algorithm to give you the distance between the nodes as a measure of their similarity.
One difficulty here is that some words have different meanings in different contexts. Your algorithm may need to take this into account and use directed links with the weight of the outgoing link dependent on the incoming link being followed (or ignore some outgoing links based on the incoming link).