views:

601

answers:

6

Hi all,
I want to know what disadvantage of scanf() exist.

In many of a sites I have read that using scanf will cause buffer overflow some times. What is the reason for that, and is there any other drawbacks with scanf?

+8  A: 

The problems with scanf are (at a minimum):

  • using %s to get a string from the user, which leads to the possibility that the string may be longer than your buffer, causing overflow.
  • the possibility of a failed scan leaving your file pointer in an indeterminate location.

I very much prefer using fgets to read whole lines in so that you can limit the amount of data read. If you've got a 1K buffer, and you read a line into it with fgets you can tell if the line was too long by the fact there's no terminating newline character (last line of a file without a newline notwithstanding).

Then you can complain to the user, or allocate more space for the rest of the line (continuously if necessary until you have enough space). In either case, there's no risk of buffer overflow.

Once you've read the line in, you know that you're positioned at the next line so there's no problem there. You can then sscanf your string to your heart's content without having to save and restore the file pointer for re-reading.

Here's a snippet of code which I frequently use to ensure no buffer overflow when asking the user for information.

It could be easily adjusted to use a file other than standard input if necessary and you could also have it allocate its own buffer (and keep increasing it until it's big enough) before giving that back to the caller (although the caller would then be responsible for freeing it, of course).

#include <stdio.h>
#include <string.h>

#define OK       0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
    int ch, extra;

    // Get line with buffer overrun protection.
    if (prmpt != NULL) {
        printf ("%s", prmpt);
        fflush (stdout);
    }
    if (fgets (buff, sz, stdin) == NULL)
        return NO_INPUT;

    // If it was too long, there'll be no newline. In that case, we flush
    // to end of line so that excess doesn't affect the next call.
    if (buff[strlen(buff)-1] != '\n') {
        extra = 0;
        while (((ch = getchar()) != '\n') && (ch != EOF))
            extra = 1;
        return (extra == 1) ? TOO_LONG : OK;
    }

    // Otherwise remove newline and give string back to caller.
    buff[strlen(buff)-1] = '\0';
    return OK;
}

 

// Test program for getLine().

int main (void) {
    int rc;
    char buff[10];

    rc = getLine ("Enter string> ", buff, sizeof(buff));
    if (rc == NO_INPUT) {
        // Extra NL since my system doesn't output that on EOF.
        printf ("\nNo input\n");
        return 1;
    }

    if (rc == TOO_LONG) {
        printf ("Input too long [%s]\n", buff);
        return 1;
    }

    printf ("OK [%s]\n", buff);

    return 0;
}

A test run:

$ ./tstprg
Enter string>[CTRL-D]
No input

$ ./tstprg
Enter string> a
OK [a]

$ ./tstprg
Enter string> hello
OK [hello]

$ ./tstprg
Enter string> hello there
Input too long [hello the]

$ ./tstprg
Enter string> i am pax
OK [i am pax]
paxdiablo
+2  A: 

Yes, you are right. There is a major security flaw in scanf family(scanf,sscanf, fscanf..etc) esp when reading a string, because they don't take the length of the buffer (into which they are reading) into account.

Example:

char buf[3];
sscanf("abcdef","%s",buf);

clearly the the buffer buf can hold MAX 3 char. But the sscanf will try to put "abcdef" into it causing buffer overflow.

codaddict
You can provide "%10s" as the format specifier and it will read no more than 10 characters into the buffer.
dreamlax
Yes..that work..but does one always use scanf with a width ?
codaddict
Sure - it's possible to use the API safely. It's also possible to use dynamite to clear dirt out of your garden safely. But I wouldn't recommend either, especially since there are safer alternatives.
Larry Osterman
My dad used to use gelignite for clearing down trees on the farm. You just have to understand your tools and know the dangers.
paxdiablo
That buffer can only hold 2 chars since you need to reserve one for the null terminator.
Arthur Kalliokoski
@Arthur, nice demonstration of the danger of dynamite.
Hans Passant
@codaddict: The fact that someone doesn't use field width with `scanf` is the problem with that someone, not with `scanf`. It is completely irrelevant to the issue in question. This is C after all, not Java.
AndreyT
The problem is that the field width in `scanf()` must be hardcoded in the conversion specifier; with `printf()`, you can use `*` in the conversion specifier and pass the length as an argument. But since `*` means something different in `scanf()`, that doesn't work, so you basically have to generate a new format for each read like Alok does in his example. It just adds more work and clutter; might as well use `fgets()` and be done with it.
John Bode
+4  A: 

From the comp.lang.c FAQ: Why does everyone say not to use scanf? What should I use instead?

jamesdlin
+1  A: 

It is very hard to get scanf to do the thing you want. Sure, you can, but things like scanf("%s", buf); are as dangerous as gets(buf);, as everyone has said.

As an example, what paxdiablo is doing in his function to read can be done with something like:

scanf("%10[^\n]%*[^\n]", buf));
getchar();

The above will read a line, store the first 10 non-newline characters in buf, and then discard everything till (and including) a newline. So, paxdiablo's function could be written using scanf the following way:

#include <stdio.h>

enum read_status {
    OK,
    NO_INPUT,
    TOO_LONG
};

static int get_line(const char *prompt, char *buf, size_t sz)
{
    char fmt[40];
    int i;
    int nscanned;

    printf("%s", prompt);
    fflush(stdout);

    sprintf(fmt, "%%%zu[^\n]%%*[^\n]%%n", sz-1);
    /* read at most sz-1 characters on, discarding the rest */
    i = scanf(fmt, buf, &nscanned);
    if (i > 0) {
        getchar();
        if (nscanned >= sz) {
            return TOO_LONG;
        } else {
            return OK;
        }
    } else {
        return NO_INPUT;
    }
}

int main(void)
{
    char buf[10+1];
    int rc;

    while ((rc = get_line("Enter string> ", buf, sizeof buf)) != NO_INPUT) {
        if (rc == TOO_LONG) {
            printf("Input too long: ");
        }
        printf("->%s<-\n", buf);
    }
    return 0;
}

One of the other problems with scanf is its behavior in case of overflow. For example, when reading an int:

int i;
scanf("%d", &i);

the above cannot be used safely in case of an overflow. Even for the first case, reading a string is much more simpler to do with fgets rather than with scanf.

Alok
Wow Alok! When did you surpass 10k? Congratulations and keep it up! :)
dreamlax
@dreamlax: I passed 10k this month I think. Looks like you're pretty close as well. Thanks for the message. :-)
Alok
+6  A: 

Most of the answers so far seem to focus on the string buffer overflow issue. In reality, the format specifiers that can be used with scanf functions support explicit field width setting, which limit the maximum size of the input and prevent buffer overflow. This renders the popular accusations of string-buffer overflow dangers present in scanf virtually baseless. Claiming that scanf is somehow analogous to gets in the respect is completely incorrect. There's a major qualitative difference between scanf and gets: scanf does provide the user with string-buffer-overflow-preventing features, while gets doesn't.

One can argue that these scanf features are difficult to use, since the field width has to be embedded into format string (there's no way to pass it through a variadic argument, as it can be done in printf). Thats is actually true. But nevertheless any claims that scanf is somehow formally broken with regard to string-buffer-overflow safety are completely bogus and usually made by lazy programmers.

The real problem with scanf has a completely different nature, even though it is also about overflow. When scanf function is used for converting decimal representations of numbers into values of arithmetic types, it provides no protection from arithmetic overflow. If overflow happens, scanf produces undefined behavior. For this reason, the only proper way to perform the conversion in C standard library is functions from strto... family.

So, to summarize the above, the problem with scanf is that it is difficult (albeit possible) to use properly and safely with string buffers. And it is impossible to use safely for arithmetic input. The latter is the real problem. The former is just an inconvenience.

P.S. The above in intended to be about the entire family of scanf functions (including also fscanf and sscanf). With scanf specifically, the obvious issue is that the very idea of using a strictly-formatted function for reading potentially interactive input is rather questionable.

AndreyT
+1  A: 

Problems I have with the *scanf() family:

  • Potential for buffer overflow with %s and %[ conversion specifiers. Yes, you can specify a maximum field width, but unlike with printf(), you can't make it an argument in the scanf() call; it must be hardcoded in the conversion specifier.
  • Potential for arithmetic overflow with %d, %i, etc.
  • Limited ability to detect and reject badly formed input. For example, "12w4" is not a valid integer, but scanf("%d", &value); will successfully convert and assign 12 to value, leaving the "w4" stuck in the input stream to foul up a future read. Ideally the entire input string should be rejected, but scanf() doesn't give you an easy mechanism to do that.

If you know your input is always going to be well-formed with fixed-length strings and numerical values that don't flirt with overflow, then scanf() is a great tool. If you're dealing with interactive input or input that isn't guaranteed to be well-formed, then use something else.

John Bode