Many languages in Europe are inflectional. This means that one word can be written in multiple forms in text. For example, word 'computer' in polish "komputer" has multiple forms: "komputera", "komputerowi", "komputerem", "komputery" , etc..
How should I use django+haystack+whoosh properly to deal with language inflection?
Whenever I search for any form of "komputer", "komputera", "komputerowi" I mean this same thing ->"komputer".
In NLP there is a basic approach based either on stemming words (cutting suffixes) either on converting a form to the base form ("komputerowi" => "komputer"). There are some libraries that can help with that.
My first thought was to prepare some special template filter that will convert every recognized word in a given variable to the text with base forms rather then forms. Then I could use it in search index templates in django+haystack. If search query will be also converted before evaluate in whoosh engine this should work great. See example:
haystack search index template:
{{some_indexed_text|convert_to_base_form_filter}}
text to index: "Nie ma komputera" => "Nie ma komputer" <- this is really indexed
search query: "komputery" => "komputer" <-- this will match
But I don't think that this is "elegant" solution of this problem, also some other features won't work - like suggesting misspelling suggestions.
So - how should I solve this issue? Maybe I should use other search engine than whoosh?