ansaurus

Question

Is there a free library for morphological analysis of the German language?

Answer 1

+1 A:

I don't think that this can be done without a dictionary.

Rules-based approaches will invariably trip over things like

gegessen -> essen
gegangen -> angen

(note to people who don't speak german: the correct solution in the second case is "gehen").

Svante 2009-03-25 10:05:40

You are partially right, I updated my question.

DR 2009-03-25 10:29:25

Answer 2

+1 A:

Have a look at Leo. They offer the data which you are after, maybe it gives you some ideas.

weismat 2009-03-25 10:19:57

Answer 3

+5 A:

I think you are looking for a "stemming algorithm".

Martin Porter's approach is well known among linguists. The Porter stemmer is basically an affix stripping algorithm, combined with a few substitution rules for those special cases.

Most stemmers deliver stems that are linguistically "incorrect". For example: both "beautiful" and "beauty" can result in the stem "beauti", which, of course, is not a real word. This doesn't matter, though, if you're using those stems to improve search results in information retrieval systems. Lucene comes with support for the Porter stemmer, for instance.

Porter also devised a simple programming language for developing stemmers, called Snowball.

There are also stemmers for German available in Snowball. A C version, generated from the Snowball source, is also available on the website, along with a plain text explanation of the algorithm.

Here's the German stemmer in Snowball: http://snowball.tartarus.org/algorithms/german/stemmer.html

If you're looking for the corresponding stem of a word as you would find it in a dictionary, along with information on the part of speech, you should Google for "lemmatization".

gclj5 2009-03-25 11:11:19

ansaurus

tags:

views:

answers:

Is there a free library for morphological analysis of the German language?

related questions