views:

3210

answers:

13

I have a piece of code written by a very old school programmer :-) . it goes something like this

typedef struct ts_request
{ 
  ts_request_buffer_header_def header; 
  char                         package[1]; 
} ts_request_def; 

ts_request_buffer_def* request_buffer = 
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));

the programmer basically is working on a buffer overflow concept. I know the code looks dodgy. so my questions are:

  1. Does malloc always allocate contiguous block of memory ?. because in this code if the blocks are not contiguous , the code will fail big time

  2. Doing free(request_buffer) , will it free all the bytes allocated by malloc i.e sizeof(ts_request_def) + (2 * 1024 * 1024), or only the bytes of the size of the structure sizeof(ts_request_def)

  3. Do you see any evident problems with this approach , i need to discuss this with my boss and would like to point out any loopholes with this approach

A: 

The answer to question 1 and 2 is Yes

About ugliness (ie question 3) what is the programmer trying to do with that allocated memory?

hhafez
+5  A: 

1) Yes it does, or malloc will fail if there isn't a large enough contiguous block available. (A failure with malloc will return a NULL pointer)

2) Yes it will. The internal memory allocation will keep track of the amount of memory allocated with that pointer value and free all of it.

3)It's a bit of a language hack, and a bit dubious about it's use. It's still subject to buffer overflows as well, just may take attackers slightly longer to find a payload that will cause it. The cost of the 'protection' is also pretty hefty (do you really need >2mb per request buffer?). It's also very ugly, although your boss may not appreciate that argument :)

workmad3
+3  A: 

As for #3, without more code it's hard to answer. I don't see anything wrong with it, unless its happening a lot. I mean, you don't want to allocate 2mb chunks of memory all the time. You also don't want to do it needlessly, e.g. if you only ever use 2k.

The fact that you don't like it for some reason isn't sufficient to object to it, or justify completely re-writing it. I would look at the usage closely, try to understand what the original programmer was thinking, look closely for buffer overflows (as workmad3 pointed out) in the code that uses this memory.

There are lots of common mistakes that you may find. For example, does the code check to make sure malloc() succeeded?

jeffamaphone
+3  A: 

The exploit (question 3) is really up to the interface towards this structure of yours. In context this allocation might make sense, and without further information it is impossible to say if it's secure or not.
But if you mean problems with allocating memory bigger than the structure, this is by no means a bad C design (I wouldn't even say it's THAT old school... ;) )
Just a final note here - the point with having a char[1] is that the terminating NULL will always be in the declared struct, meaning there can be 2 * 1024 * 1024 characters in the buffer, and you don't have to account for the NULL by a "+1". Might look like a small feat, but I just wanted to point out.

E Dominique
Also, the standard doesn't allow arrays of size 0, though some compilers do.
TrayMan
That's right, but you could have a char*.
E Dominique
No he can't; a char * would address memory somewhere else completely, instead of contiguous with the structure.For C99, the proper declaration for this is a flexible-size array "char package[]". But pretty much any compiler supporting that also supports the GNU extension for size 0.
puetzk
@puetzk yes, you are indeed correct. Ofcourse.
E Dominique
+9  A: 

3 - That's a pretty common C trick to allocate a dynamic array at the end of a struct. The alternative would be to put a pointer into the struct and then allocate the array separately, and not forgetting to free it too. That the size is fixed to 2mb seems a bit unusual though.

TrayMan
thanks a lot for your comments . basically we receive data from socket.we do not know the exact size we are going to receive and have capped it at 2 MB . the data we receive is copied into this structure . This change was done because this was the one with the min impact.
@unknown (google), if the size is fixed, you can also change the array size from 1 to your fixed size. This trick makes only sense for arrays with variable lengths.
quinmars
+6  A: 

This is a standard C trick, and isn't more dangerous that any other buffer.

If you are trying to show to your boss that you are smarter than "very old school programmer", this code isn't a case for you. Old school not necessarily bad. Seems the "old school" guy knows enough about memory management ;)

qrdl
I agree. Given the context of it's use (to store an unknown amount of data from a socket), I'd say it's pretty good (although I would read a much smaller chunk at a time from the socket, and then realloc() and keep reading until the socket was empty in order to minimize memory footprint).
Chris Lutz
...(but with modern CPU speeds that's unlikely to be a HUGE issue regardless.)
Chris Lutz
It is more dangerous in the sense that it's undefined behaviour.
Chris Young
What do you call by "undefined behaviour" in this case? It doesn't violate C standard. The only problem I see is sizeof() will return invalid size but it is always the case for dynamic buffers.
qrdl
why dont you just answer the questions and then shut up instead of trying to do my appraisal..
Probably because you don't know the language good enough but trying to attack someone who knows much more then you, just to show how "smart" you are to your boss. Unfortunately there are enough people of this kind in our profession, but usually they quit programming and move to management.
qrdl
@qrdl: See the edit of my answer at http://stackoverflow.com/questions/625270/does-malloc-allocate-a-contiguous-block-of-memory/625430#625430
Chris Young
@Chris: I see what you mean. Indeed it is undefined behaviour, agreed. Thank you for pointing this out.
qrdl
to qrdl : i strongly suspect you are the one who wrote this code :-) i will repeat .. either answer or shut up ... dont go for assumptions about my 'intentions' .. unfortunately there are enough people like you too "oh god . i know everything and people are always out to prove me wrong"
@qrdl : so we agree its undefined behaviour and since its UNDEFINED behaviour and if this piece of code does cause some problems , i will be the one who will have to spend nights debugging this , and not you , you twit
Too often I made the vary same mistake and then repeat to myself: "I need to learn to spot idiots faster and do not react".<hangup>
qrdl
@qrdl : yes, had you learned that , you would have looked in the mirror and would have never touched a keyboard again . :-) ..anyway i dont understand why you come here again and again to get insulted ..and btw its "very" and not "vary" ..
+2  A: 

I've seen and used this pattern frequently.

Its benefit is to simplify memory management and thus avoid risk of memory leaks. All it takes is to free the malloc'ed block. With a secondary buffer, you'll need two free. However one should define and use a destructor function to encapsulate this operation so you can always change its behavior, like switching to secondary buffer or add additional operations to be performed when deleting the structure.

Access to array elements is also slightly more efficient but that is less and less significant with modern computers.

The code will also correctly work if memory alignment changes in the structure with different compilers as it is quite frequent.

The only potential problem I see is if the compiler permutes the order of storage of the member variables because this trick requires that the package field remains last in the storage. I don't know if the C standard prohibits permutation.

Note also that the size of the allocated buffer will most probably be bigger than required, at least by one byte with the additional padding bytes if any.

chmike
The C standard requires members to be in the order you put them in the struct. However, it's undefined behaviour for reasons I explained in my answer.
Chris Young
+1  A: 

Would like to add that not is it common but I might also called it a standard practice because Windows API is full of such use.

Check the very common BITMAP header structure for example.

http://msdn.microsoft.com/en-us/library/aa921550.aspx

The last RBG quad is an array of 1 size, which depends on exactly this technique.

CDR
+22  A: 

To answer your numbered points.

  1. Yes.
  2. All the bytes. Malloc/free doesn't know or care about the type of the object, just the size.
  3. It is strictly speaking undefined behaviour, but a common trick supported by many implementations. See below for other alternatives.

The latest C standard, ISO/IEC 9899:1999 (informally C99), allows flexible array members.

An example of this would be:

int main(void)
{       
    struct { size_t x; char a[]; } *p;
    p = malloc(sizeof *p + 100);
    if (p)
    {
        /* You can now access up to p->a[99] safely */
    }
}

This now standardized feature allowed you to avoid using the common, but non-standard, implementation extension that you describe in your question. Strictly speaking, using a non-flexible array member and accessing beyond its bounds is undefined behaviour, but many implementations document and encourage it.

Furthermore, gcc allows zero-length arrays as an extension. Zero-length arrays are illegal in standard C, but gcc introduced this feature before C99 gave us flexible array members.

In a response to a comment, I will explain why the snippet below is technically undefined behaviour. Section numbers I quote refer to C99 (ISO/IEC 9899:1999)

struct {
    char arr[1];
} *x;
x = malloc(sizeof *x + 1024);
x->arr[23] = 42;

Firstly, 6.5.2.1#2 shows a[i] is identical to (*((a)+(i))), so x->arr[23] is equivalent to (*((x->arr)+(23))). Now, 6.5.6#8 (on the addition of a pointer and an integer) says:

"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

For this reason, because x->arr[23] is not within the array, the behaviour is undefined. You might still think that it's okay because the malloc() implies the array has now been extended, but this is not strictly the case. Informative Annex J.2 (which lists examples of undefined behaviour) provides further clarification with an example:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

Chris Young
+1, for the flexible and zero-length arrays. You could maybe also add that the benefit of this practice is that you save the memory for one pointer and reduce it to only one (expensive) allocation.
quinmars
I disagree about undefined behaviour. malloc() is guaranteed to return continuous block of memory so you can safely access memory beyond the struct using either pointer arithmetics of array index - according to the standard they are the same. So it is defined behaviour.
qrdl
@qrdl: The standard specifically disallows accessing beyond the array. I have edited my post to explain why it's undefined.
Chris Young
@ chris : thanks a lot for your answer .
@unknown (google): You're welcome. Perhaps you could make the answer as accepted then? :)
Chris Young
@Chris: This may be nitpicking, but as I understand it `malloc` will allocate contiguous virtual memory space, but the actual physical memory which backs it may not be contiguous. At least that's how it seems to work in Linux, AFAIK.
Robert S. Barnes
@Robert S. Barnes: You are not incorrect, but the physical layout is entirely irrelevant to the C standard. It only matters that it appears contiguous to the program when accessed in a well-defined manner. It's equally true and irrelevant to point out that the memory might not be contiguous because it may span several pieces of silicon.
Chris Young
+3  A: 

I don't think the existing answers quite get to the essence of this issue. You say the old-school programmer is doing something like this;

typedef struct ts_request
{ 
  ts_request_buffer_header_def header; 
  char                         package[1]; 
} ts_request_def;

ts_request_buffer_def* request_buffer = 
malloc(sizeof(ts_request_def) + (2 * 1024 * 1024));

I think it's unlikely he's doing exactly that, because if that's what he wanted to do he could do it with simplified equivalent code that doesn't need any tricks;

typedef struct ts_request
{ 
  ts_request_buffer_header_def header; 
  char                         package[2*1024*1024 + 1]; 
} ts_request_def;

ts_request_buffer_def* request_buffer = 
malloc(sizeof(ts_request_def));

I'll bet that what he's really doing is something like this;

typedef struct ts_request
{ 
  ts_request_buffer_header_def header; 
  char                         package[1]; // effectively package[x]
} ts_request_def;

ts_request_buffer_def* request_buffer = 
malloc( sizeof(ts_request_def) + x );

What he wants to achieve is allocation of a request with a variable package size x. It is of course illegal to declare the array's size with a variable, so he is getting around this with a trick. It looks as if he knows what he's doing to me, the trick is well towards the respectable and practical end of the C trickery scale.

Bill Forster
+1  A: 

Yes. malloc returns only a single pointer - how could it possibly tell a requester that it had allocated multiple discontiguous blocks to satisfy a request?

Andrew Medico
Right, that is the job for the OS and virtual memory through the MMU. The actual physical blocks of RAM are likely all over the place.
Zan Lynx
+1  A: 

In response to your third question.

free always releases all the memory allocated at a single shot.

int* i =(int*) malloc(1024*2);

free(i+1024); //gives error free(i); //releases all the 2KB memory

+1  A: 

This common C trick is also explained in this StackOverflow question (Can someone explain this definition of the dirent struct in solaris?).

X-Istence