views:

490

answers:

7

The title says it all: Given some (English) word that we shall assume is a plural, is it possible to derive the singular form? I'd like to avoid lookup/dictionary tables if possible.

Some examples:

Examples  -> Example    a simple 's' suffix
Glitch    -> Glitches   'es' suffix, as opposed to above
Countries -> Country    'ies' suffix.
Sheep     -> Sheep      no change: possible fallback for indeterminate values

Or, this seems to be a fairly exhaustive list.

Suggestions of libraries in language x are fine, as long as they are open-source (ie, so that someone can examine them to determine how to do it in language y)

+1  A: 

No - English isn't a language which sticks to many rules.

I think your best bet is either:

  • use a dictionary of common words and their plurals (or group them by their plural rule, eg: group words where you just add an S, words where you add ES, words where you drop a Y and add IES...)
  • rethink your application
nickf
Yea, after discovering that list I linked, my hopes plummeted, but I was still curious.
Matthew Scharley
English plurals are actually pretty regular. Far more so than say German or french.
cletus
A: 

It is not possible, as nickf has already said. It would be simple for the classes of words you have described, but what about all the words that end with s naturally? My name, Marius, for example, is not plural of Mariu. Same with Bus I guess. Pluralization of words in English is a one way function (a hash function), and you usually need the rest of the sentence or paragraph for context.

Marius
For my intentended purpose, I can (relatively) safely assume that the word I am looking at is a plural, ie. in the context it wouldn't make sense elsewise.
Matthew Scharley
+7  A: 

It really depends on what you mean by 'programmatically'. Part of English works on easy to understand rules, and part doesn't. It has to do mainly with frequency. For a brief overview, you can read Pinker's "Words and Rules", but do yourself a favor and don't take the whole generative theory of linguistics entirely to heart. There's a lot more empiricism there than that school of thought really lends to the pursuit.

A lot of English can be statistically lemmatized. By the way, stemming or lemmatization is the term you're looking for. One of the most effective lemmatizers which work off of statistical rules bootstrapped with frequency-based exceptions is the Morpha Lemmatizer. You can give this a shot if you have a project that requires this type of simplification of strings which represent specific terms in English.

There are even more naive approaches that accomplish much with respect to normalizing related terms. Take a look at the Porter Stemmer, which is effective enough to cluster together most terms in English.

Robert Elwell
+1 for morpha.. i had the same problem, and morpha did a really good job at solving it
adi92
Another stemmer option is the UEA stemmer for which there are Ruby, Java, Perl, and Scala implementations.http://github.com/ealdent/uea-stemmer/tree/masterhttp://www.uea.ac.uk/cmp/research/graphicsvisionspeech/speech/WordStemminghttp://github.com/DRMacIver/uea-stemmer-scala/tree/master
ealdent
A: 
cletus
You could have the de-pluralizer guess which one is right, then check a dictionary to see if the word it guesses exists, but even this is going to get it wrong sometimes.
Chris Lutz
If you're using a dictionary anyway, you have access to all the plurals so there's no need for an algorithm.
cletus
In English: box -> boxes (not boxs), dish -> dishes (not dishs), etc.
Robert L
+1  A: 

Is "axes" the plural of "ax" or of "axis"? Even a human cannot tell without context.

Robert L
[AmE "ax" = BrE "axe".] Similarly, is "ellipses" the plural of an "ellipse" (an oval shape) or an "ellipsis" (…)? Is "bases" the plural of a "base" or a "basis"? Is "taxes" the plural of a "tax", or a "taxis" (as in biology)? [Other examples, anyone?]
ShreevatsaR
How about which of indexes or indices is the plural of "index"?
JB King
+1  A: 

You can take a look at Inflector.net - my port of Rails' inflection class.

Andrew Peters
A: 

Probably not seeing as English uses pluralization rules from multiple languages. In addition to that no rule will ever let you know that Goose is the singular form of Geese or Octopus is the singular form of octopi.

Goose...Geese Mouse...Mice Octopus...Octopi

gshauger
Actually, octopi is incorrect. The classically proper form is octopodes, but the accepted form is octopuses.
WCWedin
Actually it's not..."There are three forms of the plural of octopus; namely, octopuses, octopi, and octopodes. Currently, octopuses is the most common form in the US as well as the UK; octopodes is rare, and octopi is often objectionable"
gshauger