The US census bureau uses a special encoding called “soundex” to locate information about a person. The soundex is an encoding of surnames (last names) based on the way a surname sounds rather than the way it is spelled. Surnames that sound the same, but are spelled differently, like SMITH and SMYTH, have the same code and are filed together. The soundex coding system was developed so that you can find a surname even though it may have been recorded under various spellings.
In this lab you will design, code, and document a program that produces the soundex code when input with a surname. A user will be prompted for a surname, and the program should output the corresponding code.
Basic Soundex Coding Rules
Every soundex encoding of a surname consists of a letter and three numbers. The letter used is always the first letter of the surname. The numbers are assigned to the remaining letters of the surname according to the soundex guide shown below. Zeroes are added at the end if necessary to always produce a four-character code. Additional letters are disregarded.
Soundex Coding Guide
Soundex assigns a number for various consonants. Consonants that sound alike are assigned the same number:
Number Consonants
1 B, F, P, V 2 C, G, J, K, Q, S, X, Z 3 D, T 4 L 5 M, N 6 R
Soundex disregards the letters A, E, I, O, U, H, W, and Y.
There are 3 additional Soundex Coding Rules that are followed. A good program design would implement these each as one or more separate functions.
Rule 1. Names With Double Letters
If the surname has any double letters, they should be treated as one letter. For example:
Gutierrez is coded G362 (G, 3 for the T, 6 for the first R, second R ignored, 2 for the Z). Rule 2. Names with Letters Side-by-Side that have the Same Soundex Code Number
If the surname has different letters side-by-side that have the same number in the soundex coding guide, they should be treated as one letter. Examples:
Pfister is coded as P236 (P, F ignored since it is considered same as P, 2 for the S, 3 for the T, 6 for the R).
Jackson is coded as J250 (J, 2 for the C, K ignored same as C, S ignored same as C, 5 for the N, 0 added).
Rule 3. Consonant Separators
3.a. If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded. Example:
Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored (see "Side-by-Side" rule above), 2 for the K). Since the vowel "A" separates the Z and K, the K is coded. 3.b. If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right is not coded. Example:
*Ashcraft is coded A261 (A, 2 for the S, C ignored since same as S with H in between, 6 for the R, 1 for the F). It is not coded A226.
So far this is my code:
surname = raw_input("Please enter surname:")
outstring = ""
outstring = outstring + surname[0]
for i in range (1, len(surname)):
nextletter = surname[i]
if nextletter in ['B','F','P','V']:
outstring = outstring + '1'
elif nextletter in ['C','G','J','K','Q','S','X','Z']:
outstring = outstring + '2'
elif nextletter in ['D','T']:
outstring = outstring + '3'
elif nextletter in ['L']:
outstring = outstring + '4'
elif nextletter in ['M','N']:
outstring = outstring + '5'
elif nextletter in ['R']:
outstring = outstring + '6'
print outstring
sufficiently does what it is asked to, I am just not sure how to code the three rules. That is where I need help. So, any help is appreciated.