I'm in need of some inspiration. For a hobby project I am playing with content analysis. I am basically trying to analyze input to match it to a topic map.
For example:
- "The way on Iraq" > History, Middle East
- "Halloumni" > Food, Middle East
- "BMW" > Germany, Cars
- "Obama" > USA
- "Impala" > USA, Cars
- "The Berlin Wall" > History, Germany
- "Bratwurst" > Food, Germany
- "Cheeseburger" > Food, USA
- ...
I've been reading a lot about taxonomy and in the end, whatever I read concludes that all people tag differently and therefor the system is bound to fail.
I thought about tokenized input and stop word lists, but they are of course a lot of work to come up with and build. Building the relevant links between words and topics seems exhausting and also never ending cause whatever language you deal with, it's very rich and most languages also heavily rely on context. Let alone maintaining it.
I guess I need to come up with something smart and train it with topics I want it to be able to guess. Kind of like an Eliza bot.
Anyway, I don't believe there is something that does that out of the box, but does anyone have any leads or examples for technology to use in order to analyze input in order to extract meaning?