+7  A: 

You might be interested in WordNet. It takes a bit of linguistic knowledge to understand the API, but basically the system is a database of meaning-based links between English words, which is more or less what you're searching for. I'm sure I can dig up more information if you want it.

David Zaslavsky
WordNet is good, but it will miss out on proper names too:$ wn baseball -overOverview of noun baseballThe noun baseball has 2 senses (first 2 from tagged texts) 1. (21) baseball, baseball game -- (a ball game played with a bat and ...
Hemal Pandya
drfloob
Glad it works for you ;-) Many notable (famous) proper names are in WordNet - and for those that are not, I'm sure the database will expand to include more of them over time. (you could probably even contribute to it)
David Zaslavsky
+3  A: 
Imran
+5  A: 

Peter Norvig (director of research at Google) spoke about how they do this at Google (specifically mentioning Google Sets) in a recent Facebook Tech Talk. The idea is that a relatively simple algorithm on a huge dataset (e.g. the entire web) is much better than a complicated algorithm on a small data set.

You could look at Google's n-gram collection as a starting point. You'd start to see what concepts are grouped together. Norvig hinted that internally Google has up to 7-grams for use in things like Google Translate.

If you're more ambitious, you could download all of Wikipedia's articles in the language you desire and create your own n-gram database.

The problem is even more complicated if you just have a single word; check out this recent thesis for more details on word sense disambiguation.

It's not an easy problem, but it is useful as you mentioned. In the end, I think you'll find that a really successful implementation will have arelatively simple algorithm and a whole lot of data.

Good luck!

Jeff Moser