views:

944

answers:

10

I'm working on the K&R book. I've read farther ahead than I've done exercises, mostly for lack of time. I'm catching up, and have done almost all the exercises from chapter 1, which is the tutorial.

My issue was exercise 1-18. The exercise is to:

Write a program to remove trailing blanks and tabs from line of input, and to delete entirely blank lines

My code (below) does that, and works. My problem with it is the trim method I implemented. It feels ... wrong ... somehow. Like if I saw similar code in C# in a code review, I'd probably go nuts. (C# being one of my specialties.)

Can anyone offer some advice on cleaning this up -- with the catch that said advice has to only use knowledge from Chapter 1 of K & R. (I know there are a zillion ways to clean this up using the full C library; we're just talking Chapter 1 and basic stdio.h here.) Also, when giving the advice, can you explain why it will help? (I am, after all, trying to learn! And who better to learn from than the experts here?)

#include <stdio.h>

#define MAXLINE 1000

int getline(char line[], int max);
void trim(char line[], char ret[]);

int main()
{
    char line[MAXLINE];
    char out[MAXLINE];
    int length;

    while ((length = getline(line, MAXLINE)) > 0)
    {
        trim(line, out);
        printf("%s", out);
    }

    return 0;
}

int getline(char line[], int max)
{
    int c, i;

    for (i = 0; i < max - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
        line[i] = c;

    if (c == '\n')
    {
        line[i] = c;
        ++i;
    }

    line[i] = '\0'; 
    return i;
}

void trim(char line[], char ret[])
{
    int i = 0;

    while ((ret[i] = line[i]) != '\0')
        ++i;

    if (i == 1)
    {
        // Special case to remove entirely blank line
        ret[0] = '\0';
        return;
    }

    for (  ; i >= 0; --i)
    {
        if (ret[i] == ' ' || ret[i] == '\t')
            ret[i] = '\0';
        else if (ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n')
            break;
    }

    for (i = 0; i < MAXLINE; ++i)
    {
        if (ret[i] == '\n')
        {
            break;
        }
        else if (ret[i] == '\0')
        {
            ret[i] = '\n';
            ret[i + 1] = '\0';
            break;
        }
    }
}

EDIT: I appreciate all the helpful tips I'm seeing here. I would like to remind folks that I'm still a n00b with C, and specifically haven't gotten up to pointers yet. (Remember the bit about Ch.1 of K&R -- Ch.1 doesn't do pointers.) I "kinda" get some of those solutions, but they're still a touch advanced for where I'm at ...

And most of what I'm looking for is the trim method itself -- specifically the fact that I'm looping through 3 times (which feels so dirty). I feel like if I were just a touch more clever (even without the advanced knowledge of C), this could have been cleaner.

+7  A: 

If you are sticking with chapter 1, that looks pretty good to me. Here's what I would recommend from a code-review standpoint:

When checking equality in C, always put the constant first

if (1 == myvar)

That way you will never accidentally do something like this:

if (myvar = 1)

You can't get away with that in C#, but it compiles fine in C and can be a real devil to debug.

Eric Z Beard
Whoa, lots of downvotes... haven't had much coffee yet, what's wrong with that?
Eric Z Beard
no reason to downvote perfectly valid advice, i personally do not bother to do sow but any way perfectly valid voting up.
Ilya
I guess some people don't like the (1 == x) syntax. I'm one of them, but I don't feel that it's worth a downvote.
aib
That syntax is part of the MISRA standard. I like it but I know that the guy I sit next to hates it!
Klelky
Really? What's wrong with that syntax? As soon as I adopted it I stopped wasting so much time tracking down the missing = sign.
Eric Z Beard
It makes code less readable. A better solution is to set the compiler warning level high enough to flag it as a warning.
Ferruccio
I agree it is less readable, less natural when you say "if 1 equals myvar". I didn't realize there was a compiler flag, I'd be happier with that.
Eric Z Beard
I am going to downvote it again: 1==var it has nothing to do with the "uglines" that the question refers to. I do hate this convention - it make me do the work for compiler, but I am downvoting it as irrelevant and off-topic.
Arkadiy
+8/-5 so far. Wow, I didn't think this would be so polarizing.
Eric Z Beard
+1 relevant, useful, on-topic...unless you don't like writing maintainable C.
sixlettervariables
+3  A: 

There is no reason to have two buffers, you can trim the input line in place

int trim(char line[])
{
    int len = 0;
    for (len = 0; line[len] != 0; ++len)
     ;

    while (len > 0 &&
           line[len-1] == ' ' && line[len-1] == '\t' && line[len-1] == '\n')
     line[--len] = 0;

    return len;
}

By returning the line length, you can eliminate blank lines by testing for non-zero length lines

if (trim(line) != 0)
    printf("%s\n", line);

EDIT: You can make the while loop even simpler, assuming ASCII encoding.

while (len > 0 && line[len-1] <= ' ')
    line[--len] = 0;
Ferruccio
This is the kind of idea I'm looking for ... But I plugged it in and played with it a bit, and instead of trimming end spaces and blank lines, it's actually inserting extra blank lines. :)
John Rudy
That's what happens when you type in code first thing in the morning without actually checking it :-)
Ferruccio
I wrote != instead of == in the original while loop.
Ferruccio
:) I'm working with this now, on my lunch break ... I'm still having some glitches, but I think I can work those out on my own ... I think this may well be the answer I accept!
John Rudy
+1  A: 

Personally for while constructs:

I prefer the following:

while( (ret[i] = line[i]) )
        i++;

to:

while ((ret[i] = line[i]) != '\0')
        ++i;

They both check against != 0 but the first looks a little cleaner. If the char is anything other thah 0, then the loop body will execute else it will break out of the loop.

Also for 'for' statements, whilst being syntatically valid, I find that the following:

for (  ; i >= 0; --i)

just looks 'odd' to me and indeed is a potential nightmare solution for potential bugs. If I was reviewing this code, it would be like a glowing red warning like. Typically you want to use for loops for iterating a known number of times, otherwise cosider a while loop. (as always there are exceptions to the rule but Ive found that this generally holds true). The above for statement could become:

while (i)
{
        if (ret[i] == ' ' || ret[i] == '\t')
        {
            ret[i--] = '\0';
        }
        else if (ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n')
        {
            break;
        }
}
TK
Erm, you mean "while (ret[i] = line[i])"..
aib
Thanks for the advice on replacing the for loop with a while construct.
John Rudy
A: 

First of all:

int main(void)

You know the parameters to main(). They're nothing. (Or argc&argv, but I don't think that's Chapter 1 material.)

Stylewise, you might want to try K&R-style brackets. They're much easier on the vertical space:

void trim(char line[], char ret[])
{
    int i = 0;

    while ((ret[i] = line[i]) != '\0')
        ++i;

    if (i == 1) { // Special case to remove entirely blank line
        ret[0] = '\0';
        return;
    }

    for (; i>=0; --i) { //continue backwards from the end of the line
        if ((ret[i] == ' ') || (ret[i] == '\t')) //remove trailing whitespace
            ret[i] = '\0';

        else if ((ret[i] != '\0') && (ret[i] != '\r') && (ret[i] != '\n')) //...until we hit a word character
            break;
    }

    for (i=0; i<MAXLINE-1; ++i) { //-1 because we might need to add a character to the line
        if (ret[i] == '\n') //break on newline
            break;

        if (ret[i] == '\0') { //line doesn't have a \n -- add it
            ret[i] = '\n';
            ret[i+1] = '\0';
            break;
        }
    }
}

(Also added comments and fixed one bug.)

A big issue is the usage of the MAXLINE constant -- main() exclusively uses it for the line and out variables; trim(), which is only working on them doesn't need to use the constant. You should pass the size(s) as a parameter just like you did in getline().

aib
+1  A: 

trim() is too big.

What I think you need is a strlen-ish function (go ahead and write it int stringlength(const char *s)).

Then you need a function called int scanback(const char *s, const char *matches, int start) which starts at start, goes down to z as long as the character being scanned at s id contained in matches, return the last index where a match is found.

Then you need a function called int scanfront(const char *s, const char *matches) which starts at 0 and scans forward as long as the character being scanned at s is contained in matches, returning the last index where a match is found.

Then you need a function called int charinstring(char c, const char *s) which returns non-zero if c is contained in s, 0 otherwise.

You should be able to write trim in terms of these.

plinth
A: 

Here's my stab at the exercise without knowing what is in Chapter 1 or K & R. I assume pointers?

#include "stdio.h"

size_t StrLen(const char* s)
{
    // this will crash if you pass NULL
    size_t l = 0;
    const char* p = s;
    while(*p)
    {
        l++;
        ++p;
    }
    return l;
}

const char* Trim(char* s)
{
    size_t l = StrLen(s);
    if(l < 1)
        return 0;

    char* end = s + l -1;
    while(s < end && (*end == ' ' || *end == '\t'))
    {
        *end = 0;
        --end;
    }

    return s;
}

int Getline(char* out, size_t max)
{
    size_t l = 0;
    char c;
    while(c = getchar())
    {
        ++l;

        if(c == EOF) return 0;
        if(c == '\n') break;

        if(l < max-1)
        {
            out[l-1] = c;
            out[l] = 0;
        }
    }

    return l;
}

#define MAXLINE 1024

int main (int argc, char * const argv[]) 
{
    char line[MAXLINE];
    while (Getline(line, MAXLINE) > 0)
    {
        const char* trimmed = Trim(line);
        if(trimmed)
            printf("|%s|\n", trimmed);

        line[0] = 0;
    }

    return 0;
}
orj
uh, that looks dangerous. What happens if some one calls Trim(" "); You are going to read memory that is outside of the string. And with some bad luck you are going to write on that memory, too.
quinmars
There may be bugs in that code. I didn't test it super thoroughly. You are right. The while loop condition in Trim() should also be testing that end is greater than s. Assuming strings grow up in memory addresses.
orj
A: 

Personally I'd put code like this:

ret[i] != '\0' && ret[i] != '\r' && ret[i] != '\n'

into a separate function (or even a define macro)

ilitirit
A: 
  1. trim should indeed use 1 buffer only (as @Ferruccio says).
  2. trim needs to be broken up, as @plinth says
  3. trim needs not return any value (if you want to check for empty string, test line[0] == 0)
  4. for extra C flavor, use pointers rather than indexes

-go to end of line (terminating 0; -while not at the start of line and current character is space, replace it with 0. -back off one char

char *findEndOfString(char *string) {
  while (*string) ++string;
  return string; // string is now pointing to the terminating 0
}

void trim(char *line) {
  char *end = findEndOfString(line);
   // note that we start at the first real character, not at terminating 0
  for (end = end-1; end >= line; end--) {
      if (isWhitespace(*end)) *end = 0;
      else return;
  }
}
Arkadiy
+2  A: 

First off, thanks to everyone for the advice and help. Ferruccio got me closest, and here's how it ended up:

#include <stdio.h>

#define MAXLINE 1000

int getline(char line[], int max);
int trim(char line[]);

int main(void)
{
    char line[MAXLINE];
    int length;

    while ((length = getline(line, MAXLINE)) > 0)
    {
        if (trim(line) != 0)
            printf("%s\n", line);
    }

    return 0;
}

int getline(char line[], int max)
{
    int c, i;

    for (i = 0; i < max - 1 && (c = getchar()) != EOF && c != '\n'; ++i)
        line[i] = c;

    if (c == '\n')
    {
        line[i] = c;
        ++i;
    }

    line[i] = '\0'; 
    return i;
}

int trim(char line[])
{
    int len = 0;
    for (len = 0; line[len] != '\0'; ++len)
        ;

    while (len > 0 && (line[len-1] == ' ' || line[len-1] == '\t' || line[len-1] == '\n'))
    {
        line[--len] = '\0';
    }

    return len;
}
John Rudy
A: 

Another example of doing the same thing. Did some minor violation by using C99-specific stuff. that will not be found in K&R. also used the assert() function which is part of the starndard library, but is probably not covered in chapter one of K&R.

#include <stdbool.h> /* needed when using bool, false and true. C99 specific. */
#include <assert.h> /* needed for calling assert() */

typedef enum {
  TAB = '\t',
  BLANK = ' '
} WhiteSpace_e;

typedef enum {
  ENDOFLINE = '\n',
  ENDOFSTRING = '\0'
} EndofLine_e;

bool isWhiteSpace(
  char character
) {
  if ( (BLANK == character) || (TAB == character ) ) {
    return true;
  } else {
    return false;
  }
}

bool isEndOfLine( 
  char character
) {
 if ( (ENDOFLINE == character) || (ENDOFSTRING == character ) ) {
    return true;
  } else {
    return false;
  }
}   

/* remove blanks and tabs (i.e. whitespace) from line-string */
void removeWhiteSpace(
  char string[]
) {
  int i;
  int indexOutput;

  /* copy all non-whitespace character in sequential order from the first to the last.
    whitespace characters are not copied */
  i = 0;
  indexOutput = 0;
  while ( false == isEndOfLine( string[i] ) ) {
    if ( false == isWhiteSpace( string[i] ) ) {
      assert ( indexOutput <= i );
      string[ indexOutput ] = string[ i ];
      indexOutput++;
    }
    i++; /* proceed to next character in the input string */
  }

  assert( isEndOfLine( string[ i ] ) );
  string[ indexOutput ] = ENDOFSTRING;

}