views:

164

answers:

2

Does anyone know? Is this a place to ask Computer science questions or just programming?

+3  A: 

Questions related to software development (including algorithms and techniques) are fine here. For "text chunking" in natural language processing, see here (you probably want all the lectures in this series as a kind of "NLP 101"...): it spans a series of tasks such as finding noun groups, finding verb groups, and complete partitioning sentence -> chunks of several types. The lecture whose URL I quoted goes into more details!

Alex Martelli
+2  A: 

Chunking is also called shallow parsing and it's basically the identification of parts of speech and short phrases (like noun phrases). Part of speech tagging tells you whether words are nouns, verbs, adjectives, etc, but it doesn't give you any clue about the structure of the sentence or phrases in the sentence. Sometimes it's useful to have more information than just the parts of speech of words, but you don't need the full parse tree that you would get from parsing.

An example of when chunking might be preferable is Named Entity Recognition. In NER, your goal is to find named entities, which tend to be noun phrases (though aren't always), so you would want to know that President Barack Obama is in the following sentence:

President Barack Obama criticized insurance companies and banks as he urged supporters to pressure Congress to back his moves to revamp the health-care system and overhaul financial regulations. (source)

But you wouldn't necessarily care that he is the subject of the sentence.

Chunking has also been fairly commonly used as a preprocessing step for other tasks like example-based machine translation, natural language understanding, speech generation, and others.

ealdent