tags:

views:

112

answers:

1

I have this program in C I wrote that reads a file line by line (each line has just one word), sorts the alphabets, and then displays the sorted word and the original word in each line.

#include<stdio.h>

int main()
{
  char line[128];
  int i=0;
  int j;
  int length;

  while(fgets(line,sizeof line,stdin) != NULL)
  {
    char word[128];

    for (i=0; line[i] != '\0'; i++)
    {
      word[i]=line[i];
    }

    while (line[i] != '\0')
      i++;

    length=i;

 for (i=length-1; i >=0; i--)
    {
      for (j=0; j<i; j++)
      {
        if (line[j] > line[i])
        {
          char temp;
          temp = line[j];
          line[j] = line[i];
          line[i]=temp;
        }
      }
    }
    printf("%s %s",line,word);

  }
  return 0;
}

I'm compiling and running it using the following bash commands.

gcc -o sign sign.c
./sign < sample_file | sort > output

The original file (sample_file) looks like this:

computer
test
file
stack
overflow

The output file is this:

ackst stack
cemoprtu computer
efil file
efloorvw overflow
er
estt test
ter
ter

I'm having two issues:

  1. The output file has a bunch of newline characters at the beginning (ie. about 5-7 blank lines before the actual text begins)
  2. Why does it print 'ter' twice at the end?

PS - I know these are very elementary questions, but I only just started working with C / bash for a class and I'm not sure where I am going wrong.

+1  A: 

Problem 1

After this code, the variable line contains a line of text, including the newline character from the end of the string

while(fgets(line,sizeof line,stdin) != NULL)
{

This is why you are getting the "extra" newlines. The ASCII value for newline is less than the ASCII value for 'A'. That is why the newlines end up at the beginning of each string, once you've sorted the characters. E.g. "computer\n" becomes "\ncemoprtu".

To solve this, you can strip the newlines off the end of your strings, after the for-loop

if(i > 0 && word[i-1] == '\n')
{
  word[i-1] = '\0';
  line[i-1] = '\0';
  --i;
}

...

printf("%s %s\n",line,word); /* notice the addition of the newline at the end */

This happens to solve problem 2, as well, but please read on, to see what was wrong.

Problem 2

After the loop

for (i=0; line[i] != '\0'; i++) { /* */ }

The string word will not be null-terminated (except by blind luck, since it is ready random uninitialized memory). This is why you get the "ter", because that is part of the data you left behind when you copied the word "computer" to word.

Problem 3

After the loop

for (i=0; line[i] != '\0'; i++) { /* */ }

The value of line[i] != '\0' will always be false. This means that this code will do nothing

while (line[i] != '\0')
  i++;

It might make the problem more obvious if I replace the for-loop and the while-loop with basically identical code, using goto:

i=0;
begin_for_loop:
if(line[i] != '\0')
{
  {
    word[i]=line[i];
  }
  i++;
  goto begin_for_loop;
}

begin_while_loop:
if(line[i] != '\0')
{
  i++;
  goto begin_while_loop;
}

(btw, most professional programmers will do anything from laugh to yell at you if you mention using goto :) I only am using it here to illustrate the point)

A tip I find handy is to draw out my arrays, variables etc on a piece of paper, then trace through each line of my code (again, on paper) to debug how it works.

Merlyn Morgan-Graham
Wouldn't line (since it is an array) always be null-terminated? In that case, why would line[i] != '\0' always be false?
xbonez
`fgets` will null-terminate whatever buffer it writes into. Because of this, `line[i] == '\0'` will be true. Think about it another way: the for-loop exits only when `line[i] != '\0'`. Then you immediately do the same check again. Of course you're going to get the same answer.
Merlyn Morgan-Graham
But to answer the *exact* thing you asked: "Wouldn't line (since it is an array) always be null-terminated?". No. Arrays can have any values in them. They are never guaranteed to be null-terminated unless you manually do it, or the function you call guarantees to do it. `fgets` makes that guarantee, but `word[i]=line[i];` does not. You'll have to null-terminate `word` yourself (just add: `word[i] = '\0';` after the for-loop).
Merlyn Morgan-Graham
You say the for loop exits only when line[i] != '\0'. Isn't it that the for loop runs long as line[i] != '\0'? Or am I missing something veyr basic here?
xbonez
@xbonez: Oops! Yeah, I forgot to say "evaluates to false" :) I should have said "the for-loop exits only when `line[i] != '\0'` evaluates to false. Then you immediately do the same check again. Of course you're going to get the same answer."
Merlyn Morgan-Graham
The code sorts the characters in the words ASCIIbetically. I think that is what's intended by "sorts the alphabets". It's an inefficient variant of selection sort, but I think it's correct.
Matthew Flaschen
@Matthew Flaschen: Good catch. I will remove that "problem" from my answer, since it seems it's doing the sort correctly.
Merlyn Morgan-Graham
Thanks for all the help. I know my final product is probably quite inefficient but it gets the job done, and they don't grade on efficiency at school. Thank god for that
xbonez
@xbonez: Efficiency of implementation trumps efficiency of CPU time in this case :) Since you capped your length, you get a very small worse-case. 128 + 127 + ... + 1 is still miniscule (8256 compares/swaps). That's probably going to measure in miliseconds or even microseconds.
Merlyn Morgan-Graham