ansaurus

Question

Soundex algorithm in Python (homework help request)

Answer 1

+1 A:

I would suggest you try the following.

Store a CurrentCoded and LastCoded variable to work with before appended to your output
Break down the system into useful functions, such as
1. Boolean IsVowel(Char)
2. Int Coded(Char)
3. Boolean IsRule1(Char, Char)

Once you break it down nicely it should become easier to manage.

Robin Day 2009-10-26 17:58:15

Answer 2

A:

This is hardly perfect (for instance, it produces the wrong result if the input doesn't start with a letter), and it doesn't implement the rules as independently-testable functions, so it's not really going to serve as an answer to the homework question. But this is how I'd implement it:

>>> def soundex_prepare(s):
        """Prepare string for Soundex encoding.

        Remove non-alpha characters (and the not-of-interest W/H/Y), 
        convert to upper case, and remove all runs of repeated letters."""
        p = re.compile("[^a-gi-vxz]", re.IGNORECASE)
        s = re.sub(p, "", s).upper()
        for c in set(s):
            s = re.sub(c + "{2,}", c, s)
        return s

>>> def soundex_encode(s):
        """Encode a name string using the Soundex algorithm."""
        result = s[0].upper()
        s = soundex_prepare(s[1:])
        letters = 'ABCDEFGIJKLMNOPQRSTUVXZ'
        codes   = '.123.12.22455.12623.122'
        d = dict(zip(letters, codes))
        prev_code=""
        for c in s:
            code = d[c]
            if code != "." and code != prev_code:
                result += code
         if len(result) >= 4: break
            prev_code = code
        return (result + "0000")[:4]

Robert Rossney 2009-10-26 20:16:47

ansaurus

tags:

views:

answers:

Soundex algorithm in Python (homework help request)

related questions