lemmatization

How do I do word Stemming or Lemmatization?

I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones. My test words are: "cats running ran cactus cactuses community communities", and both get less than half right. Ideally the class/function would be in PHP, but I can port it if it's in another language. See also: Stemming algorith...

How to turn plural words singular?

I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now: If a word ends with -ies, I replace the ending with -y If a word ends with -es, I remove this ending. This doesn't always work however...

How is Morpha Lemmatizer Used?

I'm intending to use SQL version of WordNet and I have a problem finding a way to lemmatize words in order to find them in the DB; I can't use the WordNet lemmatizer itself because it is applied to the textual version of WorldNet. I've read here that there is a good lemmatizer that returns real words - and that's exactly what I need. I...

Can you programmatically detect pluralizations of English words, and derive the singular form?

The title says it all: Given some (English) word that we shall assume is a plural, is it possible to derive the singular form? I'd like to avoid lookup/dictionary tables if possible. Some examples: Examples -> Example a simple 's' suffix Glitch -> Glitches 'es' suffix, as opposed to above Countries -> Country 'ies' suffix....

Inflectional forms of verbs using DBsight lucene?

I know dbsight allows synonyms and stop words for searching but does this take care of inflectional forms of a verb too e.g. for 'swim' it should find swim, swims, swimming, swam, and swum Link on DBSight Wiki : http://wiki.dbsight.com/index.php?title=User%5Fdictionary ...

what is the true difference between lemmatization vs stemming?

When do I use each ? Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was? ...

Using a lemmatizer in ruby

I have tried using a stemmer but the words it produces are just not upto the mark. It could be great if you could let me know any lemmatizer script there exists for ruby or a lemmatizer gem or an SQL query that bundles out the lemma of a word in the wordnet database. Cheers ! ...

SQL word root matching

Hi all, I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root. We know it's easy to match "networking" when searching for "network" because the latter is a substring of the former. But do SQL engines have functions that can mat...