views:

44

answers:

2

Hi folks,

I have lists of articles made of: title, subtitle and body.

Now I need to parse all these articles and group them up under different context categories or sub categories based on their possible keywords.

e.g. if the article is likely to be related to sports cars then the article would be associated with the car or/and vehicle context


Now I understand that this is a vast ocean, but this is also why I have put up this question. Because the ocean of solutions might be too big for me, and I would most likely get lost and adopt some bad thought solution.

There are probably some popular and standardized ways of doing this that I do not know, and it would be very useful if someone pointed me in the right direction.

Help would be great. =)

+1  A: 

The Natural Lanugage Toolkit but don't expect that there is a magic bullet in there which will keep you having to learn a fair bit about linguistics, as the problem you describe cannot be solved wholly mechanically.

msw
+1  A: 

http://en.wikipedia.org/wiki/Category:Library_of_Congress_Classification

Joni
Counter-argument: the library problem http://www.zackgrossbart.com/hackito/the-library-problem/
msw