Put simply a Soundex Algorithm changes a series of characters into a code. Characters that produce the same Soundex code are said to sound the same.
- The code is 4 characters wide
- The first character of the code is always the first character of the word
Each character in the alphabet belongs in a particular group (at least in this example, and code thereafter this is the rule I'll be sticking with):
- b, p, v, f = 1
- c, g, j, k, q, s, x, z = 2
- d, t = 3
- l = 4
- m, n = 5
- r = 6
- Every other letter in the alphabet belongs to group 0.
Other notable rules include:
- All letters that belong to group 0 are ignored UNLESS you have run out of letters in the provided word, in which case the rest of the code is filled with 0's.
- The same number cannot be used twice or more consecutively, thus the character is ignored. The only exception is the rule above with multiple 0's.
For example, the word "Ray" will produce the following Soundex code: R000 (R is the first character of the provided word, a is apart of group 0 so it's ignored, y is apart of group 0 so it's ignored, there are no more characters so the 3 remaining characters in the code are 0).
I've created a function that has passed to it 1) a 128 character array which is used in create the Soundex code and 2) an empty 5 character array which will be used to store the Soundex code at the completion of the function (and pass back by reference as most arrays do for use in my program).
My problem is however, with the conversion process. The logic I've provided above isn't exactly working in my code. And I do not know why.
// CREATE A SOUNDEX CODE
// * Parameter list includes the string of characters that are to be converted to code and a variable to save the code respectively.
void SoundsAlike(const char input[], char scode[])
{
scode[0] = toupper(input[0]); // First character of the string is added to the code
int matchCount = 1;
int codeCount = 1;
while((matchCount < strlen(input)) && (codeCount < 4))
{
if(((input[matchCount] == 'b') || (input[matchCount] == 'p') || (input[matchCount] == 'v') || (input[matchCount] == 'f')) && (scode[codeCount-1] != 1))
{
scode[codeCount] = 1;
codeCount++;
}
else if(((input[matchCount] == 'c') || (input[matchCount] == 'g') || (input[matchCount] == 'j') || (input[matchCount] == 'k') || (input[matchCount] == 'q') || (input[matchCount] == 's') || (input[matchCount] == 'x') || (input[matchCount] == 'z')) && (scode[codeCount-1] != 2))
{
scode[codeCount] = 2;
codeCount++;
}
else if(((input[matchCount] == 'd') || (input[matchCount] == 't')) && (scode[codeCount-1] != 3))
{
scode[codeCount] = 3;
codeCount++;
}
else if((input[matchCount] == 'l') && (scode[codeCount-1] != 4))
{
scode[codeCount] = 4;
codeCount++;
}
else if(((input[matchCount] == 'm') || (input[matchCount] == 'n')) && (scode[codeCount-1] != 5))
{
scode[codeCount] = 5;
codeCount++;
}
else if((input[matchCount] == 'r') && (scode[codeCount-1] != 6))
{
scode[codeCount] = 6;
codeCount++;
}
matchCount++;
}
while(codeCount < 4)
{
scode[codeCount] = 0;
codeCount++;
}
scode[4] = '\0';
cout << scode << endl;
}
I'm not sure if it's because of my overuse of strlen, but for some reason while the program is running within the first while loop none of the characters are actually converted to code (i.e. none of the if statements are actually run).
So what am I doing wrong? Any help would be greatly appreciated.