ansaurus

Question

Print a Histogram based on word lengths (C)

Answer 1

+1 A:

To histogram the word lengths, you are going to need to know the word lengths.

How do you define a word?
How can you measure the length of a word? Can you do it one character at a time as you read the stream, or should you buffer the input an use strtok or something similar?

You will need to accumulate data on how many occurrences of each length occur.

How will you store this data?

You will need to output the results in a pleasing form. This is fiddly but not hard.

dmckee 2010-10-21 16:49:52

Answer 2

+3 A:

I suggest you simplify the problem by solving it for the case of one word per line, so you can use fgets. Here's how to "eat up" lines that are too long.

Then, as often, the central data structure is the key to solving the problem. The data structure you need is an array used as frequency table:

int freq[11];

In freq[1], store the number of words/lines of length 1, in freq[2] those of length 2, etc., and in freq[0] those of length >10. You don't need to store the words since the rest of the program only needs their length. Writing out the histogram should be easy now.

I hope this isn't too much of a spoiler.

larsmans 2010-10-21 16:50:59

Answer 3

+2 A:

I loved the pseudo-code! Some good thinking there, but you're still not ordering your program right.

As you said yourself, you can't read the text, go back and print an X in a particular row. If we establish that it can't be done, then there's no choice but to know all the values of the histogram beforehand.

So you should think your program as having two parts (and you'll make this kind of division in practically every program you write): first, a part that will make calculations; and then a part that will output them in a certain format (the histogram).

This tip should get you started! If you need further help, comment below.

Santiago Lezica 2010-10-21 16:54:30

+1 This is the best place to start. Sub-divide the task into its logical parts, and tackle them one at a time.

bta 2010-10-21 17:13:26

I agree bta. I must admit I wasn't expecting so much information from everybody (thank you!), but it's a little overwhelming so I'm going to stick with this one for now (I'll read everyone's eventually though :-)) @Santiago Lezica - I've semi completed the code now with your advice, how does it look? http://fpaste.org/7ELL/ - am I on the right track?

Matt2012 2010-10-21 18:13:55

Yes, you are! That looks good, good structure. Two things: first, think about what you have to do when STATE == IN, and what you have to when state changes from IN to OUT. Second, take into account that your text might not finish with a blank (space, tab, newline) and that you have to consider the last word too when you find the EOF.

Santiago Lezica 2010-10-21 20:41:12

Ah, just noticed you replied :). Here's where I'm at with it now: http://fpaste.org/NRbh/ -- I'm going to take a break (have to replace some windows) but currently I'm using integers and just printing the tally for testing purposes before the actual histogram. I've noticed one bug (I think) with it not counting the tally correctly (it's omitting some words I believe) that I'll try and solve later. Just wanted to show you my progress!

Matt2012 2010-10-21 21:04:29

That's looking good! In your final version, I guess you'll want to use an array of MAXWORDLENGTH integers, plus another integer for words that exceed that length. Then you can do words[length]++ instead of one, two, etc. Also, check for the 2nd loop inside the 1st: do you think it's necessary? You can do everything with a single loop, switching between your two states (IN/OUT).

Santiago Lezica 2010-10-22 00:53:37

Done! Still don't completely understand arrays, I was mostly doing trial and error, but sure enough at 2:30AM I managed to compile something worthy ;-). Here it is: http://fpaste.org/5fHT/ -- Thanks for your help! I'll look into that second loop tomorrow, I'm tired as hell.

Matt2012 2010-10-22 06:41:17

Answer 4

+1 A:

I will link the answer below but since you asked for details the key seems to be this

Use an ARRAY of lengths i.e have an array with each element initialised to zero assume MAX wordlength to be about 30...

*have a flag while in the word and increment a counter every time a whitespace is NOT encountered

*once out of the word flag is set to "out" and the corresponding word length index item in the array is incremented i.e if word length counter is w_ctr use

array[w_ctr]++

*use the array as a table of reference for each line in a loop to print each line in the histogram so you can use the array and will now be able to determine weather the 'X' in the histogram is to be inserted or not

EDIT: sorry i didn't read the question right but the idea is simpler for Vertical histograms and the same thing can be used.

after the last step just print the horizontal histogram until counter exceeds current wordlength being printed

for(ctr=0;ctr<array[current_wordlength];ctr++)
     printf('X');

End

the original is here http://users.powernet.co.uk/eton/kandr2/krx113.html

CLC-wiki is also a place see the comments for details.

Siamore 2010-10-21 17:09:27

That site is outdated/unmaintained, see the same page on the [CLC-wiki](http://clc-wiki.net/wiki/K%26R2_solutions:Chapter_1:Exercise_13).

schot 2010-10-22 06:01:39

@schot Yes it is so i added your link i was just giving cred for the source from where i learned this some time ago.:)

Siamore 2010-10-22 16:07:34

Answer 5

A:

You should separate your 2 problems in functions, like:

void gethist(char *s, int *hist, int len)
{ /* words here breaks on spaces (' ') */
  char *t;
  for( t=strtok(s," ");t;t=strtok(0," ") )
    if(*t)
      hist[ strlen(t)>len-1?len-1:strlen(t)-1 ]++;
}

void outhist(int *hist, int len)
{
  int i;
  for( i=1; i<=len; ++i )
  {
    char *s = calloc(1,5+hist[i-1]);
    sprintf(s,"%03d|", i);
    memset( s+4, 'X', hist[i-1]);
    puts(s);
    free(s);
  }
}

then its easy in your main:

int main(void)
{
  int c, hist[11] = {};

  char *s = calloc(1,1);
  while ((c = getchar()) != EOF) {
    s = realloc( s, 2+strlen(s) );
    s[ strlen(s)+1 ] = 0;
    s[ strlen(s) ] = c;
  }

  gethist(s,hist,11); free(s);
  outhist(hist,11);

  return 0;
}

2010-10-21 17:41:19

ansaurus

tags:

views:

answers:

Print a Histogram based on word lengths (C)

related questions