views:

77

answers:

1

I have a list of strings that are all early modern English words ending with 'th.' These include hath, appointeth, demandeth, etc. -- they are all conjugated for the third person singular.

As part of a much larger project (using my computer to convert the Gutenberg etext of Gargantua and Pantagruel into something more like 20th century English, so that I'll be able to read it more easily) I want to remove the last two or three characters from all of those words and replace them with an 's,' then use a slightly modified function on the words that still weren't modernized, both included below.

My main problem is that I just never manage to get my typing right in Python. I find that part of the language really confusing at this point.

Here's the function that removes th's:

from __future__ import division
import nltk, re, pprint

def ethrema(word):
    if word.endswith('th'):
        return word[:-2] + 's'

Here's the function that removes extraneous e's:

def ethremb(word):
    if word.endswith('es'):
        return word[:-2] + 's'

hence the words 'abateth' and 'accuseth' would pass through ethrema but not through ethremb(ethrema), while the word 'abhorreth' would need to pass through both.

If anyone can think of a more efficient way to do this, I'm all ears.

Here's the result of my very amateurish attempt to use these functions on a tokenized list of words that need modernizing:

>>> eth1 = [w.ethrema() for w in text]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'ethrema'

So, yeah, it's really an issue of typing. These are the first functions I've ever written in Python, and I have no idea how to apply them to actual objects.

+4  A: 

ethrema() is not a method of the type str, you have to use the following :

eth1 = [ethrema(w) for w in text]
#AND
eth2 = [ethremb(w) for w in text]

EDIT (to answer comment) :

ethremb(ethrema(word)) wouldn't work until you made some little changes to your functions :

def ethrema(word):
    if word.endswith('th'):
        return word[:-2] + 's'
    else
        return word

def ethremb(word):
    if word.endswith('es'):
        return word[:-2] + 's'
    else
        return word

#OR

def ethrema(word):
    if word.endswith('th'):
        return word[:-2] + 's'
    elif word.endswith('es'):
        return word[:-2] + 's'
    else
        return word
Studer
Great. What about doing something like ethremb(ethrema(word))?
old Ixfoxleigh