tags:

views:

259

answers:

3

Here's the deal: I'm trying, as a learning experience, to convert a C program to C++. This program takes a text file and applies modifications to it according to user-inputted rules. Specifically, it applies sounds changes to a set of words, using rules formatted like "s1/s2/env". s1 represents the characters to be changed, s2 represents what to change it into, and env is the context in which the change should be applied.

I'm sorry that I don't describe this in more depth, but the question would be too long, and the author's site already explains it.

The function I'm having trouble is TryRule. I understand that it's supposed to see if a given rule applies to a given string, but I'm having trouble understanding exactly how it does it. The poor explanation of the parameters confuses me: for example, I don't understand why the strings "s1" and "s2" have to be passed back, or what does "i" represent.

This is the code:

/*
**  TryRule
**
**  See if a rule s1->s2/env applies at position i in the given word.
**
**  If it does, we pass back the index where s1 was found in the
**  word, as well as s1 and s2, and return TRUE.
**
**  Otherwise, we return FALSE, and pass garbage in the output variables.
*/
int TryRule( char *word, int i, char *Rule, int *n, char **s1, char **s2, char *varRep )
    {
        int j, m, cont = 0;
        int catLoc;
        char *env;
        int  optional = FALSE;
        *varRep = '\0';

        if (!Divide( Rule, s1, s2, &env ) || !strchr( env, '_' ))
            return(FALSE);

        for (j = 0, cont = TRUE; cont && j < strlen(env); j++)
        {
            switch( env[j] )
            {
                case '(':
                    optional = TRUE;
                    break;

                case ')':
                    optional = FALSE;
                    break;

                case '#':
                    cont = j ? (i == strlen(word)) : (i == 0); 
                    break;

                case '_':
                    cont = !strncmp( &word[i], *s1, strlen(*s1) );
                    if (cont)
                    {
                        *n = i;
                        i += strlen(*s1);
                    }
                    else
                    {
                        cont = TryCat( *s1, &word[i], &m, &catLoc );
                        if (cont && m)
                        {
                            int c;
                            *n = i;
                            i += m;

                            for (c = 0; c < nCat; c++)
                                if ((*s2)[0] == Cat[c][0] && catLoc < strlen(Cat[c]))
                                    *varRep = Cat[c][catLoc];
                        }
                        else if (cont)
                            cont = FALSE;
                    }
                    break;

                default:
                    cont = TryCat( &env[j], &word[i], &m, &catLoc );
                    if (cont && !m)
                    {
                        /* no category applied */
                        cont = i < strlen(word) && word[i] == env[j];
                        m = 1;
                    }
                    if (cont)
                        i += m;
                    if (!cont && optional)
                        cont = TRUE;
            }
        }
        if (cont && printRules)
            printf( "   %s->%s /%s applies to %s at %i\n", 
            *s1, *s2, env, word, *n );

    return(cont);
}
A: 

Given that you are converting from C to C++ you should be refactoring the code to become more readable as well.

One major problem with this code is that the variables have terrible names and I'd wager even the original writer of the routine would need to spend some time analysing it.

Just renaming the variables to be more precise would give you a greater hand in understanding what the code does.

Take a look at some questions tagged under refactoring for some help. There is also Refactoring by Martin Fowler

Simon Hartcher
Well, trouble is I don't get what the variables are for, so I can't rename them.
Javier Badia
@reyjavikvi As I said you may want to look into some refactoring techniques to help you on your way. As wmeyer said you may need to study the whole code to understand that routine. That's nowhere near ideal but it may be the case.
Simon Hartcher
A: 

I think you need the whole code to understand this fragment.

It looks like "word", "i" and "Rule" are input variables, the rest are pure output variables.

"i" is the current index within "word", i.e. TryRule only looks at "word" starting at "word[i]".

In "s1" the functions returns the left side of the rule that was applied. In "s2" the right side of that rule.

In "n" the function returns the position within "word" where the rule applies.

No idea what "varRep" is.

wmeyer
So what would be the difference between n and i?
Javier Badia
"i" is an input and tells TryRule to start at word[i]."n" is an output and tells the calling function that "s1" (the left side of the rule) was found at word[n].Invariant: n >= i
wmeyer
+2  A: 

This code is... tough to read. I looked at the original file, and it could really use some better variable names. I especially love this part from one of the function comments:

/*
** (Stuff I removed)
**
** Warning: For now, we don't have a way to handle digraphs. 
**
** We also return TRUE if (<- It really just stops here!)
*/

I can see the challenge. I agree with wmeyer about the variables. I think I understand things, so I'm going to attempt to translate the function into pseudo code.

Word: The string we are looking at
i: The index in the string we're looking at
Rule: The text of the rule (i.e. "v/b/_")
n: A variable to return the index into the string we found the match for the _, I think
s1: Returns the first part of the rule, decoded out of Rule
s2: Returns the second part of the rule, decoded out of Rule
varRep: Returns the character matched in the category, if a category matched, I think

int TryRule( char *word, int i, char *Rule,
                int *n, char **s1, char **s2, char *varRep ) {
        Prepare a bunch of variables we''ll use later
        Mark that we''re not working on an optional term
        Set varRep''s first char to null, so it''s an empty string

        if (We can parse the rule into it''s parts
              OR there is no _ in the environment (which is required))
            return FALSE // Error, we can't run, the rule is screwy

        for (each character, j, in env (the third part of the rule)) {
            if (cont is TRUE) {
                switch (the character we''re looking at, j) {
                    if the character is opening paren:
                        set optional to TRUE, marking it''s an optional character
                    if the character is closing paren:
                        set optional to FALSE, since we''re done with optional stuff
                    if the character is a hash mark (#):
                        // This is rather complicated looking, but it's not bad
                        // This uses a ? b : c, which means IF a THEN b ELSE c
                        // Remember i is the position in the word we are looking at
                        // Hash marks match the start or end of a word
                        // J is the character in the word

                        if (j >= 0) {
                            // We're not working on the first character in the rule
                            // so the # mark we found is to find the end of a word

                            if (i == the length of the word we''re looking at) {
                                // We've found the end of the word, so the rule matches

                                continue = true;   // Keep going
                            } else {
                                // We're not at the end of a word, but we found a hash
                                // Rule doesn't match, so break out of the main loop by setting
                                //     continue to false

                                continue = false;
                            }
                        } else {
                            // OK, the hash mark is the first part of env,
                            // so it signifies the start of a word

                            continue = (i == 0);   // Continue holds if we
                                                   // are matching the first
                                                   // character in *word or not
                        }
                    if the character is an _ (the match character):
                        // This gets complicated

                        continue = if word starting at character i ISN''T s1, the search string;

                        if (continue == TRUE) {
                            // There was no match, so we'll go look at the next word
                            n = the index of the word start that didn''t match   // Not sure why
                            i = i (start index to look) + length of s1 (word we just matched)
                            // This means i now holds the index of the start of the next word
                        } else {
                            // TryCat sees if the character we're trying to match is a category

                            continue = s1 is a category in the program
                                          && the category contains the character at word[i]

                            // If continue holds false, s1 was a category and we found no match
                            // If continue holds true, s1 either wasn't a category (so m = 0)
                            //     or s1 WAS a category, m contains 1, and catLoc holds which
                            //     character in the category definition was matched

                            if (we found a match of some sort
                                   && s1 was a category (indicated by m == 1)) {
                                n = index of the character in the word we found a match
                                i = the index of the next character (m is always 1, so this is ugly)

                                for (each category defined) {
                                    if (first character of s2
                                           == the category''s name
                                        && where in the category definition we matched
                                              is less than the length of the category we''re on) {
                                           varRep = the character matched in the category
                                        }
                                }

                                // Now the above seems EXACTLY like the TryCat function. You'd
                                // think varRep would always hold the same value as catLoc. I
                                // believe this loop is so that later rules also get applied?
                            } else {
                                continue = FALSE; // Because we didn't match a letter or category
                            }
                        }
                    Any other character:
                        continue = the character we''re looking at is a category in the program
                                      && the category contains the character at word[i]

                        if (there was a match AND it wasn''t a category (m == 0, just a letter)) {
                            m = 1;
                            continue if and only if there are characters left in the word
                                 (i < strlen()) && the current character is at word[i]
                                 (we matched a literal character, instead of a category)
                        }

                        if (continue)
                            i = i + m // Remember, M is always 1 or 0
                                      // So this is basically IF continue THEN i++ END IF
                        if ((continue == FALSE) && (optional == TRUE))
                            // We didn't find a match, but we're working on an optional part
                            // So continue anyway
                            continue = TRUE;
                end switch
             end if continue == true
        }
    }

    if (continue && printRules)
        print out a little debug statement showing what we matched

    return continue;   // At this point, if continue is false we can't keep matching
}

I hope this helps. You may need to read it a few times. It took me over 45 minutes to write this, almost entirely because of trying to decipher exactly what's going on in some of the cases around TryCat. Add in about 5 minutes for constantly trying to hit the Tab key and getting my cursor send to the next field (stupid HTML text box).

Sorry this is so big, you'll probably have to do a bunch of horizontal scrolling.

MBCook