views:

239

answers:

5

From man gets:

Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.

Almost everywhere I see scanf being used in a way that should have the same problem (buffer overflow/buffer overrun): scanf("%s",string). This problem exists in this case? Why there are no references about it in the scanf man page? Why gcc does not warn when compiling this with -Wall?

ps: I know that there is a way to specify in the format string the maximum length of the string with scanf:

char str[10];
scanf("%9s",str);

edit: I am not asking to determe if the preceding code is right or not. My question is: if scanf("%s",string) is always wrong, why there are no warnings and there is nothing about it in the man page?

+4  A: 

Using gets() is never safe. scanf() can be used safely, as you said in your question. However, determining if you're using it safely is a more difficult problem for the compiler to work out (e.g. if you're calling scanf() in a function where you pass in the buffer and a character count as arguments, it won't be able to tell); in that case, it has to assume that you know what you're doing.

Niall C.
Yes, you are right, but gcc warns you when the number or the type of arguments is not right. In these cases, it does not assume that you know what you're doing.
dbarbosa
@dbarbosa: all the information is available for the compiler to check that there are the same number of optional arguments as there are format specifiers in the format string. In the same way, it also has enough information to check that if there's a `%d` in the format string (for example) that the corresponding argument is an integer.
Niall C.
It cannot know if the specified size is right, but it can know that it is just wrong when it does not have any size specification. `scanf("%5s",string)` can be right or wrong depending on the size of `string` and it is not able to say it as you told. However, `scanf("%s",string)` is always wrong because of the buffer overflow issue.
dbarbosa
@dbarbosa: You're correct about using just "%s" as a format specifier, but I haven't been able to find any combination of gcc options (I'm using 4.4.4) that will issue a warning about it.
Niall C.
A: 

It may be simply that scanf will allocate space on the heap based on how much data is read in. Since it doesn't allocate the buffer and then read until the null character is read, it doesn't risk overwriting the buffer. Instead, it reads into its own buffer until the null character is found, and presumably copies that buffer into another of the correct size at the end of the read.

Graham
Nope; scanf doesn't allocate space on the heap. It's the programmer's job to provide the buffers. Scanf used as dbarbosa said is unsafe.
David Thornley
Take a look at this tutorial, which explains the what I just said in better detail and more precisely. http://crasseux.com/books/ctutorial/String-overflows-with-scanf.html
Graham
`scanf` does not allocate its own storage. the `a` flag you linked is a GNU extension, not part of the C standard.
John Ledbetter
@Graham: The tutorial you linked to says that the flag for scanf() to allocate its own memory is a Gnu-specific extension. In standard C, scanf() doesn't allocate storage.
David Thornley
It should be noted that not only is this GNU extension largely useless (since your code will be non-portable and there's no way to add the "a" functionality to `scanf` at the application level on systems that lack it), but it directly conflicts with C99 functionality - `%a` means hex float, as in `printf`, and is an alternate specifier for reading floats with `scanf`.
R..
+3  A: 

When the compiler looks at the formatting string of scanf, it sees a string! That's assuming the formatting string is not entered at run-time. Some compilers like GCC have some extra functionality to analyze the formatting string if entered at compile time. That extra functionality is not comprehensive, because in some situations a run-time overhead is needed which is a NO NO for languages like C. For example, can you detect an unsafe usage without inserting some extra hidden code in this case:

char* str;
size_t size;
scanf("%z", &size);
str = malloc(size);
scanf("%9s"); // how can the compiler determine if this is a safe call?!

Of course, there are ways to write safe code with scanf if you specify the number of characters to read, and that there is enough memory to hold the string. In the case of gets, there is no way to specify the number of characters to read.

AraK
It is very difficult to determine if this is a safe call, but it is easy to see a `scanf("%s", str)` and warn the user, even more then the checking that it already does with the number and the type of arguments passed to `scanf`. (by the way, gcc will say `warning: too few arguments for format` for your code because there is no argument for the `%9s` in the format).
dbarbosa
+1  A: 

I am not sure why the man page for scanf doesn't mention the probability of a buffer overrun, but vanilla scanf is not a secure option. A rather dated link - http://blogs.msdn.com/b/rsamona/archive/2005/10/24/484449.aspx shows this as the case. Also, check this (not gcc but informative nevertheless) - http://blogs.msdn.com/b/parthas/archive/2006/12/06/application-crash-on-replacing-sscanf-with-sscanf-s.aspx

Gangadhar
+3  A: 

The answer is simply that no-one has written the code in GCC to produce that warning.

As you point out, a warning for the specific case of "%s" (with no field width) is quite appropriate.

However, bear in mind that this is only the case for the case of scanf(), vscanf(), fscanf() and vfscanf(). This format specifier can be perfectly safe with sscanf() and vsscanf(), so the warning should not be issued in that case. This means that you cannot simply add it to the existing "scanf-style-format-string" analysis code; you will have to split that into "fscanf-style-format-string" and "sscanf-style-format-string" options.

I'm sure if you produce a patch for the latest version of GCC it stands a good chance of being accepted (and of course, you will need to submit patches for the glibc header files too).

caf
Why you say it can be perfectly safe with `sscanf()` and `vsscanf()`? Is it because you can check the original string (the first argument) from where scanf is reading and be sure by that it will fit?? (for example, if both sizes are the same, if you look for space positions, etc)
dbarbosa
Yes, exactly. It's *possible* to use it safely, since you control the input.
caf