views:

95

answers:

5

Does anyone have any idea of why this could happen?

I have a C program in AIX 5.3, I've been asked to run it on a SPARC Solaris 10 machine, but when I did it, I noticed there was a buffer overflow with one of the many reckless strcat uses. My goal is not to sanitize the code but to provide a concrete and well founded answer of why does this overflow happens on Solaris and not on AIX being the exact same bad coded program.

I've been reading a bit about if this could be caused by:

  1. endianess diffs between AIX and Solaris.

  2. Execution of the strcat function (AIX copies right to left and Solaris left to right) but I haven't been able to find any documentation on this.

  3. Plain and simple luck that this issue doesn't occur on AIX.

Any light you could shed on this is highly appreciated.

EDIT: could this be avoided with the noexec_user_stack flag on solaris?

EDIT 2: Does anyone have any info on the way both OSes do the actual byte copying? in a situation like the option 2 above?

EDIT 3: Here's the chunk of code:

/*global*/
char bufferA[101];
/*inside function*/
bufferA[0]='\0';
strcpy(bufferA,"1");
if (atoi(something)==0) {
strcat(bufferA,pieces_of_data);
count ++ ;
}

obviously theres more of it but this is the only part where bufferA is being used, and there 2 variables declared global after bufferA that become corrupted with the last part of the last string appended to bufferA.

As i said before, if i change the declaration from 101 to 201 the corruption does not occur.

EDIT 4: does anyone know anything about the -misalign and -misalign2 compiler options on solaris? could there be any light with these options? Actually a better question would be: is there any difference between AIX powerPC and Solaris SPARC regarding alignment? altough this is probably a question for serverfault but please share if you know something.

+2  A: 

It was either (bad?) luck or, perhaps, an artefact of slightly different memory management systems, with more space being allocated on AIX than on Solaris.

It depends, in part, on how egregious the overflows are. If they are a couple of bytes out of bounds and if AIX habitually allocates, say, 32 byte minimum blocks where Solaris allocates 16 byte minimum blocks, then there is more room for error without damage on AIX than on Solaris. Even so, if you get it wrong in the wrong context, AIX should also have the problem - you can regard yourself as unlucky not to have observed the problem on AIX for, as you say, it surely occurs there just as much as it does on Solaris if the source code you're compiling is the same.

Further investigation shows:

  • For 32-bit compilations, AIX 5.3 allocates a multiple of 16 bytes per allocation, just like Solaris.
  • For 64-bit compilations, AIX 5.3 allocates a multiple of 32 bytes per allocation; this is also the same as Solaris.

However, if you changed between 32-bit and 64-bit builds between AIX and Solaris, this might still be the source of your problem.

Whatever the answer, now you are aware of it, fix it for both platforms with one set of changes. You treat every such exposed bug with gratitude; at any time, a change on the original platform may reveal the problem - leading to major problems with dissatisfied customers.

(Oh, I think both SPARC and PPC are big-endian machines; it is Intel that is little-endian and different from the rest of the world.)


Test code - with deliberate, intentional leak

#include <stdlib.h>
#include <stdio.h>
int main(void)
{
    int sz ;
    char *buffer;
    for (sz = 1; sz < 1025; sz *= 2)
    {
        buffer = malloc(sz);
        printf("0x%08lX\n", (unsigned long)buffer);
    }   
    return 0;
} 
Jonathan Leffler
im right there with you Jonathan, i just need someone with more experience in these OSes to put me in the right direction since i cant just simple trust in my guts when i give my opinion on this. Your answer is quite useful but i could use something a bit more elaborated. anyway im looking for a reference on that allocation detail. thanks a lot
noobroot
A: 

I'm not sure if this counts as an answer or not, but a quick-and-dirty fix might be to search-and-replace all instances of malloc in the code with workaround_malloc, and implement the latter as a call to malloc(size+100).. :-)

R..
hehe i know this could be a fix and i know this code NEEDS to be sanitized but thats thankfully out of the scope of my assigment, at least for the time being... :( i might be ending being the one assigned to correct it, and that my friend would be a pain.
noobroot
+1  A: 

If it happens on one platform, it happens on the other. You were just lucky that the memory layout on the one platform caused the error to manifest.

It is undefined behavior.

If you have a steaming mound of unguarded strcpy/cats littering your code, fix the code. Don't blame the platform. Blame the author.

Valgrind is your friend.

EvilTeach
i know the platform isnt the one to blame, but since my job is simply to humor my teammates "logical" question of why did this same code didnt broke the other server? with a solid argument im not considering fixing it, just point the fingers to the code/programer instead of the environment.
noobroot
You are incorrect. An overflow is an overflow. It doesn't matter which platform. It's a question of if the way the memory is laid out, causes an actual segment violation or not. On one platform you were lucky, on the other one you were not.
EvilTeach
And which platform do you regard as being lucky and which unlucky? (I'd probably go with unlucky on AIX and lucky on Solaris - memory overwrites need to be dealt with, and spotting them is hard, so the fact that AIX didn't notice is unfortunate.)
Jonathan Leffler
Yep. I agree Jonathan.
EvilTeach
I also agree with you guys, im glad this happened since its actually a pretty bad coded program and thanks to this hardware change its going through a full audit.
noobroot
valgrind is the way to go. It really is the best tool for the job.
EvilTeach
Solaris studio dbx provides a excellent memory debugging tool (RTC=run time checking). Just turn it on with the "check -all" command and these bugs will quickly show up.
jlliagre
@jlliagre: could you please assist me in the usage of studio dbx? i only have a tty in a restricted zone to work with and im not very familiar with the solaris environment. ty
noobroot
done (see dedicated reply)
jlliagre
@jlliagre: thnx a lot man, ill check that out.
noobroot
A: 

It's probably a processor dependent manifestation of the bug. The padding and alignment of data are both usually determined by the characteristics of the target processor. Extra padding bytes may be hiding the error on one target, while on another target there are fewer padding bytes.

It could also have to do with the actual data that strcat is being passed being different. For instance if part of the string that it was building were to consist of the name of the OS, processor, or the path of a system file.

It could also be that the strings were allocated with a size based on either a header constant or a sizeof that is different for the different targets, so your actual buffer is smaller.

One thing I might try would be to have the compilers produce preprocessed versions of the source files for each target and diff those. They will probably differ a lot in the initial header code, but should be mostly similar near the end where your code is. See if any suspicious differences are in that code. Sadly sizeof will not be replaced by numbers here, but macro constants will be.

nategoose
i was reading something about this, but didnt came to anything useful though, but im sure this was due to my lack of knowledge in these OSes, anyway, i also think this (as Jonathan said) is a good guess, since the strings passed to strcat are the same.
noobroot
@nategoose: could you provide some link or something to anything in particular regarding the data padding/alignment? i think that is the reason why it never happened on AIX but im having a hard time finding information thats not 10 years old
noobroot
@noobroot: Different processors have different restrictions on the memory addresses that a variable can be at. For instance it is common for some processors to require that a 32 bit value be stored at an address which is divisible by 4 (since it is a 4 byte value) to accesses efficiently. On such a processor `struct {uint8_t a; uint32_t b;} foo;` might take up 8 bytes because a would be followed by 3 padding bytes. Similar things can happen for the layout of local variables, but the compiler has more freedom to reorder them than in structs.
nategoose
A: 

Here is how you can use Solaris Studio debugger to find memory access/usage errors:

1: Download studio
  http://www.oracle.com/technetwork/server-storage/solarisstudio/overview/index.html
2: Install it
  Uncompress/Extract the downloaded file somewhere in your disk
  Run the installer
3: Recompile your program with debugging on
  $ PATH=$PATH:/opt/<studio-installation-directory>/bin
  $ cc -g program.c
4: Launch your program under debugger control
  $ dbx program
5: Enable run time checking
  (dbx) check -all
  access checking - ON
  memuse checking - ON
6: Run your program
  (dbx) run
7: Watch all the error messages
  ...
jlliagre