I'm looking for a library which can perform a morphological analysis on German words, i.e. it converts any word into its root form and providing meta information about the analysed word.
For example:
gegessen -> essen
wurde [...] gefasst -> fassen
Häuser -> Haus
Hunde -> Hund
My wishlist:
- It has to work with both nouns and verbs.
- I'm aware that this is a very hard task given the complexity of the German language, so I'm also looking for libaries which provide only approximations or may only be 80% accurate.
- I'd prefer libraries which don't work with dictionaries, but again I'm open to compromise given the cirumstances.
- I'd also prefer C/C++/Delphi Windows libraries, because that would make them easier to integrate but .NET, Java, ... will also do.
- It has to be a free library. (L)GPL, MPL, ...
EDIT: I'm aware that there is no way to perform a morphological analysis without any dictionary at all, because of the irregular words. When I say, I prefer a library without a dictionary I mean those full blown dictionaries which map each and every word:
arbeite -> arbeiten
arbeitest -> arbeiten
arbeitet -> arbeiten
arbeitete -> arbeiten
arbeitetest -> arbeiten
arbeiteten -> arbeiten
arbeitetet -> arbeiten
gearbeitet -> arbeiten
arbeite -> arbeiten
...
Those dictionaries have several drawbacks, including the huge size and the inability to process unknown words.
Of course all exceptions can only be handled with a dictionary:
esse -> essen
isst -> essen
eßt -> essen
aß -> essen
aßt -> essen
aßen -> essen
...
(My mind is spinning right now :) )