stemming

Singular/plural searches and stemming

I'm discovering a simple solution for singular-plural keywords searches. I heard about stemming but I don't want to use all its features, only plural/singular transformation. The language is Dutch. Have looked at http://www.snowball.tartarus.org before. Does anyone know the simple solution for singular|plural relevant searches? Thanks in...

Stemming algorithm that produces real words

I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an implementation of Porter Stemmer algorithm (I'm writing in PHP by the way): http://tartarus.or...

What is the best "turnkey" stemming algorithm?

I need a good stemming algorithm for a project I'm working on. It was suggested that I look at the Porter Stemmer. When I checked out the page on the Porter stemmer I found that it is deprecated now in favor of the "Snowball" stemmer. I need a good stemmer, but I can't really spend significant time implementing (or optimizing) my own. W...

Stemming - code examples or open source projects?

Stemming is something that's needed in tagging systems. I use delicious, and I don't have time to manage and prune my tags. I'm a bit more careful with my blog, but it isn't perfect. I write software for embedded systems that would be much more functional (helpful to the user) if they included stemming. For instance: Parse Parser Par...

How do I do word Stemming or Lemmatization?

I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones. My test words are: "cats running ran cactus cactuses community communities", and both get less than half right. Ideally the class/function would be in PHP, but I can port it if it's in another language. See also: Stemming algorith...

Lucene Hebrew analyzer

Does anybody know whether one exists? I've been googling this for monthes... Thanks ...

why does Porter Stemmer yield a string which can be stemmed again?

stem('apples')='apple' stem('apple')='appl' stem('appl')='appl' isn't this a flaw in the stemming algorithm? (this is using the Porter Stemming Algorithm) ...

Can you programmatically detect pluralizations of English words, and derive the singular form?

The title says it all: Given some (English) word that we shall assume is a plural, is it possible to derive the singular form? I'd like to avoid lookup/dictionary tables if possible. Some examples: Examples -> Example a simple 's' suffix Glitch -> Glitches 'es' suffix, as opposed to above Countries -> Country 'ies' suffix....

Ruby Lingua::Stem alternative

Hi. Is there a free alternative of Perl Lingua::Stem module, able to handle Russian language? Thanks ...

Searching in Lucene .Net

I have used Lucene .Net for Indexing and using StandardAnalyzer to at time of Indexing. Now I want to search say 'attach'. In document 'attached' is there. How i get the successful hit for word 'attach'. Please help me as soon as possible. ...

Tokenizer, Stop Word Removal, Stemming in Java

Hi there I am looking for a class or method that takes a long string of many 100s of words and tokenizes, removes the stop words and stems for use in an IR system. For example: "The big fat cat, said 'your funniest guy i know' to the kangaroo..." the tokenizer would remove the punctuation and return an arrayList of words the stop wo...

Extract key sentences from a text

Hi, do you know about an effective method for extracting key sentences from a text with their frequency parameters, etc and that can also do "stemming" (search also for similar sentences) ? I wonder also if there is some software implementation Thanks a lot ...

MySQL fulltext with stems

I am building a little search function for my site. I am taking my user's query, stemming the keywords and then running a fulltext MySQL search against the stemmed keywords. The problem is that MySQL is treating the stems as literal. Here is the process that is happening: user searches for a word like "baseballs" my stemming algorithm...

Google GSA Stems for scandinavian languages

I have installed Scandinavia-2.1-1 language bundle to our GSA. After that I expected to find those languages available in Query Expansion, but nope nothing new there. Am I missing something? How are you other Scandinavians handling stems for your language? ...

Can Solr return the actual final query that was used when synonyms and stemming are used?

Hello, I would like to be able to show in my UI what the query terms were that solr used to run the final query. For example, I might type the query "run" but behind the scenes solr will use stemming to also query "ran" and "running", I may also have a synonym defined which has "run = sprint". I would like to show the user that although...

Search Engine for stemming (.Net)

Hi!!! Is there any search engine (ideally for .Net) which can return stem words after indexing text? I need to get all words in stem form from text; Please help me;) Thank you for any advice! ...

Stop-word elimination and stemmer in python

Hi, I have a somewhat large document and want to do stop-word elimination and stemming on the words of this document with "Python".Does anyone know an of the shelf package for these? If not a code which is fast enough for large documents is also welcome. Thanks ...

Schinke Latin stemming algorithm in PHP

This website offers the "Schinke Latin stemming algorithm" for download to use it in the Snowball stemming system. I want to use this algorithm, but I don't want to use Snowball. The good thing: There's some pseudocode on that page which you could translate to a PHP function. This is what I've tried: <?php function stemLatin($word) { ...

how to use stemmer in xapian java bindings..

hello, is there is any documentation or any sample program that how to use stemming in xapian Java binding... ...

is there is any stemmer available for indian language

hello, is there is any implementation of stemmers for indian languages like(hindi,telugu) are available .... ...