ansaurus

Question

Hierarchy of meaning

Answer 1

+1 A:

The opinion mining and sentiment analysis folks might be doing related things, in terms of deciding what words represent features of products, without knowing anything about the products.

A quick sketch of an idea for how you might do this, which I've totally made up on the spot: Parse a bunch of sentences in the relevant domain; find the noun phrases and adjectives. Figure out which noun phrases are associated with which adjectives. Cluster the noun phrases together based on the set of adjectives used to describe them. Animals will tend together because they're going to be described by adjectives like "furry" or "cute", etc. (In particular, hierarchical clustering would probably be most appropriate.)

If you try this, and it works, let me know. :)

Jay Kominek 2010-03-24 17:37:42

The OP said that the sets already exist and that the task is to find the most representative element of a set. What you suggest is not an answer to the question. That being said, adjectives alone will not help in clustering semantically similar nouns because most are just too widely applicable, e.g. 'cute' could be applied to girls, pieces of music, movies, social situations, etc. You need a lot more context such as typical nouns and verbs to make your idea work with at least some accuracy.

ferdystschenko 2010-03-25 08:10:55

It certainly does appear that I misread the question. That said, I don't think that broadly applicable adjectives would be a very big deal. If everything is close in the 'cute' dimension, then 'cute' will just end up not having much effect on the clusters.

Jay Kominek 2010-03-25 14:39:54

What I was trying to say was that I doubt there are adjectives discriminative for most kinds of concept clusters. Plus I don't a reason why you should limit your features to adjectives when there are other word classes potentially even more descriptive. E.g. animals may co-occur with nouns and verbs like 'forest', 'zoo', 'prey', 'hunt', etc. For a start, I wouldn't even parse the sentences but use a simple n-gram (perhaps even unigram) approach.

ferdystschenko 2010-03-26 13:41:28

Answer 2

+3 A:

It looks like you want to use something like the hypernym/hyponym relationships in WordNet, but without actually using WordNet due to language and domain specific coverage issues? That is, if you had the domain specific hypernym relationships, you could get the "super" representation by just looking for the nearest parent that subsumed all of the words in the list, or the nearest node that was equal to one of the list words and subsumed all of the others.

To start, I would first point out that WordNets are actually available for many of the worlds major languages see the list at Global WordNet.

To get domain specific hypernym relationships, you could use the technique presented in Snow et al.'s Learning syntactic patterns for automatic hypernym discovery. That is, you could start off with a small list of seed hypernyms, and then use them to train a classifier to detected the hypernyms in a corpus. You would then run this classifier over data from your domain in order to build a list of domain specific hypernym pairs.

dmcer 2010-03-24 18:14:28

ansaurus

tags:

views:

answers:

Hierarchy of meaning

related questions