tags:

views:

134

answers:

5

This is a K&R exercise (1-13)...

"Write a program to print a histogram of the length of words in its input. It is easy to draw the histogram with bars horizontal; a vertical orientation is more challenging."

The section was about arrays, and to be honest, I'm not sure I fully understood it. Everything up to this point was fairly easy to grasp, this was not.

Anyway I'm trying to do a histogram with horizontal bars first. Once I got that down I'll try vertical, but right now I'm not even sure where to begin with the easy version. (I slept on it, woke up, and still couldn't get it.)

I drew an example of what the program would output:

----------------------------------------------------------------
001|XX
002|XXXX
003|X
004|XXXXXXXXXX
005|XXXXXXXXXXXXXXXXXXXXXXXXX
006|XXXX
007|X
008|
009|XXXXXXXXX
010|XXX
>10|XXXX
----------------------------------------------------------------

And tried to break it (the program) down in sections. This is what I came up with:

  1. PRINT TOP BORDER
  2. PRINT CATEGORY, PRINT X EACH TIME CONDITION IS TRUE, PRINT NEWLINE, REPEAT.
  3. PRINT BOTTOM BORDER

But the more I think about it the less I think that's how it would work (because getchar() goes through one character at a time, and it wouldn't be able to go back up to put a X in the right category.) Or...

... I'm just really confused as to how I would solve this problem. Here's as far as I've been able to get code wise:

#include <stdio.h>

#define MAXWORDLENGTH 10

// print a histogram of the length of words in input. horizontal bar version

int main(void)
{
  int c;
  while ((c = getchar()) != EOF) {

  }

  return 0;
}

Could someone help enlighten me? Not necessarily with the code, maybe just pseudo code, or with some "words from the wise" as to what I need to do, or think, or something. This has just been a really big stone in the road and I'd like to get past it :/.

(I'll check back in 30 minutes)

+1  A: 

To histogram the word lengths, you are going to need to know the word lengths.

  • How do you define a word?
  • How can you measure the length of a word? Can you do it one character at a time as you read the stream, or should you buffer the input an use strtok or something similar?

You will need to accumulate data on how many occurrences of each length occur.

  • How will you store this data?

You will need to output the results in a pleasing form. This is fiddly but not hard.

dmckee
+3  A: 

I suggest you simplify the problem by solving it for the case of one word per line, so you can use fgets. Here's how to "eat up" lines that are too long.

Then, as often, the central data structure is the key to solving the problem. The data structure you need is an array used as frequency table:

int freq[11];

In freq[1], store the number of words/lines of length 1, in freq[2] those of length 2, etc., and in freq[0] those of length >10. You don't need to store the words since the rest of the program only needs their length. Writing out the histogram should be easy now.

I hope this isn't too much of a spoiler.

larsmans
+2  A: 

I loved the pseudo-code! Some good thinking there, but you're still not ordering your program right.

As you said yourself, you can't read the text, go back and print an X in a particular row. If we establish that it can't be done, then there's no choice but to know all the values of the histogram beforehand.

So you should think your program as having two parts (and you'll make this kind of division in practically every program you write): first, a part that will make calculations; and then a part that will output them in a certain format (the histogram).

This tip should get you started! If you need further help, comment below.

Santiago Lezica
+1 This is the best place to start. Sub-divide the task into its logical parts, and tackle them one at a time.
bta
I agree bta. I must admit I wasn't expecting so much information from everybody (thank you!), but it's a little overwhelming so I'm going to stick with this one for now (I'll read everyone's eventually though :-)) @Santiago Lezica - I've semi completed the code now with your advice, how does it look? http://fpaste.org/7ELL/ - am I on the right track?
Matt2012
Yes, you are! That looks good, good structure. Two things: first, think about what you have to do when STATE == IN, and what you have to when state changes from IN to OUT. Second, take into account that your text might not finish with a blank (space, tab, newline) and that you have to consider the last word too when you find the EOF.
Santiago Lezica
Ah, just noticed you replied :). Here's where I'm at with it now: http://fpaste.org/NRbh/ -- I'm going to take a break (have to replace some windows) but currently I'm using integers and just printing the tally for testing purposes before the actual histogram. I've noticed one bug (I think) with it not counting the tally correctly (it's omitting some words I believe) that I'll try and solve later. Just wanted to show you my progress!
Matt2012
That's looking good! In your final version, I guess you'll want to use an array of MAXWORDLENGTH integers, plus another integer for words that exceed that length. Then you can do words[length]++ instead of one, two, etc. Also, check for the 2nd loop inside the 1st: do you think it's necessary? You can do everything with a single loop, switching between your two states (IN/OUT).
Santiago Lezica
Done! Still don't completely understand arrays, I was mostly doing trial and error, but sure enough at 2:30AM I managed to compile something worthy ;-). Here it is: http://fpaste.org/5fHT/ -- Thanks for your help! I'll look into that second loop tomorrow, I'm tired as hell.
Matt2012
+1  A: 

I will link the answer below but since you asked for details the key seems to be this

Use an ARRAY of lengths i.e have an array with each element initialised to zero assume MAX wordlength to be about 30...

*have a flag while in the word and increment a counter every time a whitespace is NOT encountered

*once out of the word flag is set to "out" and the corresponding word length index item in the array is incremented i.e if word length counter is w_ctr use

array[w_ctr]++

*use the array as a table of reference for each line in a loop to print each line in the histogram so you can use the array and will now be able to determine weather the 'X' in the histogram is to be inserted or not

EDIT: sorry i didn't read the question right but the idea is simpler for Vertical histograms and the same thing can be used.

after the last step just print the horizontal histogram until counter exceeds current wordlength being printed

for(ctr=0;ctr<array[current_wordlength];ctr++)
     printf('X');    

End


the original is here http://users.powernet.co.uk/eton/kandr2/krx113.html

CLC-wiki is also a place see the comments for details.

Siamore
That site is outdated/unmaintained, see the same page on the [CLC-wiki](http://clc-wiki.net/wiki/K%26R2_solutions:Chapter_1:Exercise_13).
schot
@schot Yes it is so i added your link i was just giving cred for the source from where i learned this some time ago.:)
Siamore
A: 

You should separate your 2 problems in functions, like:

void gethist(char *s, int *hist, int len)
{ /* words here breaks on spaces (' ') */
  char *t;
  for( t=strtok(s," ");t;t=strtok(0," ") )
    if(*t)
      hist[ strlen(t)>len-1?len-1:strlen(t)-1 ]++;
}

void outhist(int *hist, int len)
{
  int i;
  for( i=1; i<=len; ++i )
  {
    char *s = calloc(1,5+hist[i-1]);
    sprintf(s,"%03d|", i);
    memset( s+4, 'X', hist[i-1]);
    puts(s);
    free(s);
  }
}

then its easy in your main:

int main(void)
{
  int c, hist[11] = {};

  char *s = calloc(1,1);
  while ((c = getchar()) != EOF) {
    s = realloc( s, 2+strlen(s) );
    s[ strlen(s)+1 ] = 0;
    s[ strlen(s) ] = c;
  }

  gethist(s,hist,11); free(s);
  outhist(hist,11);

  return 0;
}