tags:

views:

268

answers:

6

I know in C you can declare a string and the number of characters like below,

char mystring[50];

with '50' being the number of characters.

However, what is proper procedure if the user is going to be inputting the contents of the string (via scanf("%s", mystring);)? Do I leave it as,

char mystring[0];

leaving it as '0' since I have no clue how many characters the user will input?

Or do I do,

char mystring[400];

giving up to 400 characters for the user to input?

+2  A: 

The user will always be able to enter more characters, thereby overflowing your buffer (a common source of security vulnerabilities). You can, however, specify a "field width" to scanf, like so:

scanf("%50s", mystring);

In this case your buffer should be 51 characters, to account for the 50 character field plus the null terminator. Or make your buffer 50 characters and tell scanf 49 is the width.

John Zwinck
but when declaring the string, should I specify '0' or some large number?
HollerTrain
You should specify at least 51, in this example. (The length + 1 for the null terminator.)
Thanatos
ok. so is listing it as just '0' when declaring the string not proper coding? My issue is I have no idea how many the user will input but at the same time want to learn the correct method...
HollerTrain
Thanatos is right: you mustn't specify 0 as the char array size, as you do need space for the string (and C strings are not dynamically-sized).
John Zwinck
When you say you have "no idea" how many characters will be entered, is there really no maximum? Perhaps you could explain your program's needs a bit more.
John Zwinck
+2  A: 

There is a function called ggets() which is not part of the standard C library. It's a fairly simple function. It initializes a char array using malloc(). It then reads characters from stdin one char at a time. It keeps track of how many characters were read and expands the char array with realloc() when it runs out of space.

It is available here: http://cbfalconer.home.att.net/download/index.htm

I would suggest you read the code and re-implement yourself.

jonescb
+6  A: 

You've hit upon the exact problem with scanf() and %s - what happens when you don't know how much input there is?

If you try running char mystring[0];, your program will compile just fine. But you will always segfault. You're creating an array of size 0, so when you try to place something into that array, you will immediately go out of bounds for your string (since no memory will have been allocated) - which is a segfault.

So, point 1: you should always allocate a size for your string. I can think of very few circumstances (okay, none) where you would want to say char mystring[0] rather than char *mystring.

Next, when you use scanf, you never want to use the "%s" specifier - because this will not do any bounds-checking on the size of the string. so even if you have:

char mystring[512];
scanf("%s", mystring);

if the user enters more than 511 characters (since the 512th is \0), you will go out of the bounds of your array. The way to remedy this is:

scanf("%511s", mystring);

This is all to say that C doesn't have a facility to automatically resize a string if there is more input than you're expecting. This is the kind of thing you have to do manually.

One way to deal with this is by using fgets().

You could say:

while (fgets(mystring, 512, stdin))
{
   /* process input */
}

You may then use sscanf() to parse mystring

Try the above code, with a string of length 5. After 4 characters have been read, that code loops again to retrieve the rest of the input. "Processing" could include code to re-allocate a string to be a bigger size and then append the newest input from fgets().

The above code isn't perfect - it would make your program loop and process any infinite string length, so you might want to have some internal hard limit on that (eg, loop a maximum of 10 times).

rascher
t should be added that %s reads words, not whole strings. Because the scanf format string uses spaces and newlines as delimiters. In this case, use %c instead (with a field width), or fgets as you mentioned. In the case of %c with a field width, remember to initialize the whole buffer string to zero.
Mads Elvheim
The program will not always segfault. In fact, probably not most of the time. Your program will likely just be silently broken. Isn't C lovely? :-)
Tommy McGuire
A: 

The usual practice in C is to use something like GNU readline or perhaps NetBSD editline, aka libedit. (Same API, different implementation and software license.)

For a simpler or homework program, you could in theory give a field width to scanf, but a more normal practice is to fgets() to a fixed-width array and then run sscanf() on that. This way you are in control of the number of lines that are read.

DigitalRoss
A: 

As an example, if the user is entering their first name then you are not always safe maxing out the size of 'mystring' as 35 characters because some people have really long names. You don't want to reach the case where the user can't input the information you are requesting, in full. The right way to do it would be to have a temporary buffer with a very large size that will cover all possible inputs by the user. Once the user inputs the information and it gets stored into the buffer you then transfer the characters from the buffer to mystring while chopping off all the extra space at the end of the buffer. You will be able to tell the size you need for 'mystring' exactly and you can malloc just that amount of space for it and discard the buffer. This way you will not be using a string using more memory for the rest of the program ... you will only be using a string with the amount of memory you need.

Brian T Hannan
You would still have to do some kind of check to make sure that what the user inputs is not larger than the buffer allocated in the very rare cases or when someone is trying to exploit your program.
Brian T Hannan
+1  A: 

This is cbfalconer's code (http://cbfalconer.home.att.net/download/index.htm) with a couple minor modifications and compiled into one file:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "ggets.h"

#define INITSIZE   112  /* power of 2 minus 16, helps malloc */
#define DELTASIZE (INITSIZE + 16)

enum {OK = 0, NOMEM};

int fggets(char* *ln, FILE *f)
{
   int     cursize, ch, ix;
   char   *buffer, *temp;

   *ln = NULL; /* default */
   if (NULL == (buffer = malloc(INITSIZE))) return NOMEM;
   cursize = INITSIZE;

   ix = 0;
   while ((EOF != (ch = getc(f))) && ('\n' != ch)) {
      if (ix >= (cursize - 1)) { /* extend buffer */
         cursize += DELTASIZE;
         if (NULL == (temp = realloc(buffer, (size_t)cursize))) {
            /* ran out of memory, return partial line */
            buffer[ix] = '\0';
            *ln = buffer;
            return NOMEM;
         }
         buffer = temp;
      }
      buffer[ix++] = ch;
   }
   if ((EOF == ch) && (0 == ix)) {
      free(buffer);
      return EOF;
   }

   buffer[ix] = '\0';
   if (NULL == (temp = realloc(buffer, (size_t)ix + 1))) {
      *ln = buffer;  /* without reducing it */
   }
   else *ln = temp;
   return OK;
} /* fggets */
/* End of ggets.c */

int main(int argc, char **argv)
{
   FILE *infile;
   char *line;
   int   cnt;

   //if (argc == 2)
      //if ((infile = fopen(argv[1], "r"))) {
         cnt = 0;
         while (0 == fggets(&line, stdin)) {
            fprintf(stderr, "%4d %4d\n", ++cnt, (int)strlen(line));
            (void)puts(line);
            free(line);
         }
         return 0;
      //}
   //(void)puts("Usage: tggets filetodisplay");
   //return EXIT_FAILURE;
} /* main */
/* END file tggets.c */

I tested it out and it will always give you what you want.

Brian T Hannan
Basically, to get his original code you uncomment the comments and replace stdin with infile in the fggets call.
Brian T Hannan