views:

777

answers:

7

"The average man does not want to be free. He simply wants to be safe." - H. L. Menken

I am attempting to write very secure C. Below I list some of the technics I use and ask are they as secure as I think they are. By all means, please don't not hesitate to tear my code/preconceptions to shreds. Any answer that finds even the most trivial vulnerability or teaches me a new idea will be highly valued.

Reading from a stream:

According to the GNU C Programming Tutorial getline:

The getline function will automatically enlarge the block of memory as needed, via the realloc function, so there is never a shortage of space -- one reason why getline is so safe. [..] Notice that getline can safely handle your line of input, no matter how long it is.

I assume that getline should, under all inputs, prevent a buffer overflow from occurring when reading from a stream.

  • Is my assumption correct? Are there inputs and/or allocation schemes under which this could lead to an exploit? For instance what if the first character from the stream is some bizarre control character, maybe 0x08 BACKSPACE (ctl-H).
  • Has any work been done to mathematically prove getline as secure?

Malloc Returns Null on Failure:

If malloc encounters an error malloc returns a NULL pointer. This presents a security risk since one can still apply pointer arithmetic to a NULL (0x0) pointer, thus wikipedia recommends

/* Allocate space for an array with ten elements of type int. */
int *ptr = (int*)malloc(10 * sizeof (int));
if (ptr == NULL) {
    /* Memory could not be allocated, the program should handle 
       the error here as appropriate. */
} 

Secure sscanf:

When using sscanf I've gotten in the habit of allocating the size to-be-extracted strings to the size of the input string hopefully avoiding the possibility of an overrun. For example:

const char *inputStr = "a01234b4567c";
const char *formatStr = "a%[0-9]b%[0-9]c":
char *str1[strlen(inputStr)];
char *str2[strlen(inputStr)];

sscanf(inputStr, formatStr, str1, str2);

Because str1 and str2 are the size of the inputStr and no more characters than strlen(inputStr) can be read from inputStr, it seems impossible, given all possible values for the inputStr to cause a buffer overflow?

  • Am I correct? Are there strange corner cases I haven't thought of?
  • Are there better ways to write this? Libraries that have already solved it?

General Questions:

While I've posted a large number of questions I don't expect anyone to answer all of them. The questions are more of guideline to the sorts of answers I am looking for. I really want to learn the secure C mindset.

  • What other secure C idioms are out there?
  • What corner cases do I need to always check?
  • How can I write unit tests to enforce these rules?
  • How can I enforce constraints in a testability or provably correct way?
  • Any recommended static/dynamic analysis technics or tools for C?
  • What secure C practices do you follow and how do you justify them to yourself and others?

Resources:

Many of the resources were borrowed from the answers.

+2  A: 

I think your sscanf example is wrong. It can still overflow when used that way.

Try this, which specifies the maximum number of bytes to read:

void main(int argc, char **argv)
{
  char buf[256];
  sscanf(argv[0], "%255s", &buf);
}

Take a look at this IBM dev article about protecting against buffer overflows.

In terms of testing, I would write a program that generates random strings of random length and feed them to your program, and make sure they are handled appropriately.

nont
Can you explain how it is possible to overflow? I avoided using a format string which specified the characters to read, that is %255s, because I am claiming my sscanf idiom is secure even if the format string incorrectly lists the characters to read.
e5
Even better would be `sscanf (argv[0], "%*s", sizeof buf - 1, buf);` to avoid hardcoding items depending on the size of the buffer.
wallyk
@wallyk does this work for more than one format character? For example does sscanf (argv[0], "%*[5-9]abcdefg%*[0-2]", sizeof str1 - 1, str1, sizeof str2, str2); work in a safe manner? What is the * doing? Deferencing s?
e5
@walky how did you get code formatting in a comment?
e5
there's a little button that has 1's and 0's in it. Or, use the <pre> tag.
nont
Those work fine in questions, but not in comments. I don't even get a formating bar when writing a comment.
e5
Wrap your code with the grave accent character (same key as the tilde)... that will allow you to post inline code in your comment.
brianreavis
@wallyk `<code>sscanf (argv[0], "%*s", sizeof buf - 1, buf);</code>` does not work, in the context of a scan * is a assignment-suppressing token. See http://www.opengroup.org/onlinepubs/009695399/functions/scanf.html . Any idea how do what you intended without rewriting the string as posted here http://stackoverflow.com/questions/545018/how-to-pass-variable-length-width-specifier-in-sscanf
e5
`void main` is wrong. In C, `main` returns `int`.
Alok
+4  A: 

G'day,

A good place to start looking at this is David Wheeler's excellent secure coding site.

His free online book "Secure Programming for Linux and Unix HOWTO" is an excellent resource that is regularly updated.

You might also like to look at his excellent static analyser FlawFinder to get some further hints. But remember, no automated tool is a replacement for a good pair of experienced eyes, or as David so colourfully puts it..

Any static analysis tool, such as Flawfinder, is merely a tool. No tool can substitute for human thought! In short, "a fool with a tool is still a fool". It's a mistake to think that analysis tools (like flawfinder) are a substitute for security training and knowledge

I have personally used David's resources for several years now and find them to be excellent.

HTH

cheers,

Rob Wells
+1 added Secure Programming for Linux to the resource list
e5
+1  A: 

Insecure Programming by Example
blog with a few of the answers

Dustin Getz
+1 added to the resource list in the question.
e5
+1  A: 

Yannick Moy developed a Hoare/Floyd weakest precondition system for C during his PhD and applied it to the CERT managed strings library. He found a number of bugs (see page 197 of his memoir). The good news is that the library is safer now for his work.

Pascal Cuoq
That is very cool. I think I'm going to use it.
e5
+1  A: 

You could also look at Les Hatton's web site here and at his book Safer C which you can get from Amazon.

Jackson
+1  A: 

Don't use gets() for input, use fgets(). To use fgets(), if your buffer is automatically allocated (i.e., "on the stack"), then use this idiom:

char buf[N];
...
if (fgets(buf, sizeof buf, fp) != NULL)

This will keep working if you decide to change the size of buf. I prefer this form to:

#define N whatever
char buf[N];
if (fgets(buf, N, fp) != NULL)

because the first form uses buf to determine the second argument, and is clearer.


Check the return value of fclose().


Alok
+1  A: 
  1. Reading from a stream

The fact that getline() "will automatically enlarge the block of memory as needed" means that this could be used as a denial-of-service attack, as it would be trivial to generate an input that was so long it would exhaust the available memory for the process (or worse, the system!). Once an out-of-memory condition occurs, other vulnerabilities may also come into play. The behaviour of code in low/no memory is rarely nice, and very hard to predict. IMHO it is safer to set reasonable upper bounds on everything, especially in security-sensitive applications.

Furthermore (as you anticipate by mentioning special characters), getline() only gives you a buffer; it does not make any guarantees about the contents of the buffer (as the safety is entirely application-dependent). So sanitising the input is still an essential part of processing and validating user data.

  1. sscanf

I would tend to prefer to use a regular expression library, and have very narrowly defined regexps for user data, rather than use sscanf. This way you can perform a good deal of validation at the time of input.

  1. General comments

    • Fuzzing tools are available which generate random input (both valid and invalid) that can be used to test your input handling
    • Buffer management is critical: buffer overflows, underflows, out-of-memory
    • Race conditions can be exploited in otherwise secure code
    • Binary files could be manipulated to inject invalid values or oversized values into headers, so file format code must be rock-solid and not assume binary data is valid
    • Temporary files can often be a source of security issues, and must be carefully managed
    • Code injection can be used to replace system or runtime library calls with malicious versions
    • Plugins provide a huge vector for attack
    • As a general principle, I would suggest having clearly defined interfaces where user data (or any data from outside the application) is assumed invalid and hostile until it is processed, sanitised and validated, and the only way for user data to enter the application
gavinb