views:

163

answers:

4

Hi,

I'm not sure whats the best algorithm to use for the classification of relationships in words. For example in the case of a sentence such as "The yellow sun" there is a relationship between yellow and sun. THe machine learning techniques I have considered so far are Baynesian Statistics, Rough Sets, Fuzzy Logic, Hidden markov model and Artificial Neural Networks.

Any suggestions please?

thank you :)

+1  A: 

Well, no one knows what the best algorithm for language processing is because it hasn't been solved. To be able to understand a human language is to create a full AI.

Hoever, there have, of course, been attempts to process natural languages, and these might be good starting points for this sort of thing:

X-Bar Theory

Phrase Structure Rules

Noam Chomsky did a lot of work on natural language processing, so I'd recommend looking up some of his work.

Peter Alexander
Well, Chomsky has done stuff with theory of computation, esp with classifying types of languages (regular, context-free, etc), and he revolutionized NLP back in the 60's, I wouldn't really recommend him as a starting point for this task. Mainly because it's overkill, but also because it's extremely difficult to translate Chomsky's transformational grammars into an automatic parser. Dependency parsers are the way to go here probably.
ealdent
+4  A: 

It kind of sounds like you're looking for a dependency parser. Such a parser will give you the relationship between any word in a sentence and its semantic or syntactic head.

The MSTParser uses an online max-margin technique known as MIRA to classify the relationships between words. The MaltParser package does the same but uses SVMs to make parsing decisions. Both systems are trainable and provide similar classification and attachment performance, see table 1 here.

dmcer
I'd add the Stanford Parser, which does normal phrase structure parsing, but also will output dependency parses. Easy to use, open source, java, fast, etc.
ealdent
@ealdent: A big advantage of the Stanford Dependencies is that they focus on directly capturing semantically meaningful relationships. But, if speed is a big concern, the Stanford Parser is actually not that fast. The abstract I linked to above (i.e., see table 1 -here-), discusses faster ways of generating the Stanford Dependencies using other parsers like MST, Malt, and Charniak.
dmcer
+1  A: 

Like the user dmcer pointed out, dependency parsers will help you. There is tons of literature on dependency parsing you can read. This book and these lecture notes are good starting points to introduce the conventional methods.

The Link Grammar Parser which is sorta like dependency parsing uses Sleator and Temperley's Link Grammar syntax for producing word-word linkages. You can find more information on the original Link Grammar page and on the more recent Abiword page (Abiword maintains the implementation now).

For an unconventional approach to dependency parsing, you can read this paper that models word-word relationships analogous to subatomic particle interactions in chemistry/physics.

hashable
+1  A: 

The Stanford Parser does exactly what you want. There's even an online demo. Here's the results for your example.

Your sentence
The yellow sun.

Tagging
The/DT yellow/JJ sun/NN ./.

Parse
(ROOT
  (NP (DT The) (JJ yellow) (NN sun) (. .)))

Typed dependencies
det(sun-3, The-1)
amod(sun-3, yellow-2)

Typed dependencies, collapsed
det(sun-3, The-1)
amod(sun-3, yellow-2)

From your question it sounds like you're interested in the typed dependencies.

Tristan