ansaurus

Question

AIX 5.3 vs Solaris 5.10 - C strcat implementation

Answer 1

+2 A:

It was either (bad?) luck or, perhaps, an artefact of slightly different memory management systems, with more space being allocated on AIX than on Solaris.

It depends, in part, on how egregious the overflows are. If they are a couple of bytes out of bounds and if AIX habitually allocates, say, 32 byte minimum blocks where Solaris allocates 16 byte minimum blocks, then there is more room for error without damage on AIX than on Solaris. Even so, if you get it wrong in the wrong context, AIX should also have the problem - you can regard yourself as unlucky not to have observed the problem on AIX for, as you say, it surely occurs there just as much as it does on Solaris if the source code you're compiling is the same.

Further investigation shows:

For 32-bit compilations, AIX 5.3 allocates a multiple of 16 bytes per allocation, just like Solaris.
For 64-bit compilations, AIX 5.3 allocates a multiple of 32 bytes per allocation; this is also the same as Solaris.

However, if you changed between 32-bit and 64-bit builds between AIX and Solaris, this might still be the source of your problem.

Whatever the answer, now you are aware of it, fix it for both platforms with one set of changes. You treat every such exposed bug with gratitude; at any time, a change on the original platform may reveal the problem - leading to major problems with dissatisfied customers.

(Oh, I think both SPARC and PPC are big-endian machines; it is Intel that is little-endian and different from the rest of the world.)

Test code - with deliberate, intentional leak

#include <stdlib.h>
#include <stdio.h>
int main(void)
{
    int sz ;
    char *buffer;
    for (sz = 1; sz < 1025; sz *= 2)
    {
        buffer = malloc(sz);
        printf("0x%08lX\n", (unsigned long)buffer);
    }   
    return 0;
}

Jonathan Leffler 2010-09-23 17:49:30

im right there with you Jonathan, i just need someone with more experience in these OSes to put me in the right direction since i cant just simple trust in my guts when i give my opinion on this. Your answer is quite useful but i could use something a bit more elaborated. anyway im looking for a reference on that allocation detail. thanks a lot

noobroot 2010-09-23 17:58:26

Answer 2

A:

I'm not sure if this counts as an answer or not, but a quick-and-dirty fix might be to search-and-replace all instances of malloc in the code with workaround_malloc, and implement the latter as a call to malloc(size+100).. :-)

R.. 2010-09-23 19:02:10

hehe i know this could be a fix and i know this code NEEDS to be sanitized but thats thankfully out of the scope of my assigment, at least for the time being... :( i might be ending being the one assigned to correct it, and that my friend would be a pain.

noobroot 2010-09-23 20:11:28

Answer 3

+1 A:

If it happens on one platform, it happens on the other. You were just lucky that the memory layout on the one platform caused the error to manifest.

It is undefined behavior.

If you have a steaming mound of unguarded strcpy/cats littering your code, fix the code. Don't blame the platform. Blame the author.

Valgrind is your friend.

EvilTeach 2010-09-23 19:13:03

i know the platform isnt the one to blame, but since my job is simply to humor my teammates "logical" question of why did this same code didnt broke the other server? with a solid argument im not considering fixing it, just point the fingers to the code/programer instead of the environment.

noobroot 2010-09-23 20:04:40

You are incorrect. An overflow is an overflow. It doesn't matter which platform. It's a question of if the way the memory is laid out, causes an actual segment violation or not. On one platform you were lucky, on the other one you were not.

EvilTeach 2010-09-23 20:32:43

And which platform do you regard as being lucky and which unlucky? (I'd probably go with unlucky on AIX and lucky on Solaris - memory overwrites need to be dealt with, and spotting them is hard, so the fact that AIX didn't notice is unfortunate.)

Jonathan Leffler 2010-09-23 23:16:43

Yep. I agree Jonathan.

EvilTeach 2010-09-24 13:30:48

I also agree with you guys, im glad this happened since its actually a pretty bad coded program and thanks to this hardware change its going through a full audit.

noobroot 2010-09-24 20:47:18

valgrind is the way to go. It really is the best tool for the job.

EvilTeach 2010-09-25 02:49:08

Solaris studio dbx provides a excellent memory debugging tool (RTC=run time checking). Just turn it on with the "check -all" command and these bugs will quickly show up.

jlliagre 2010-09-25 13:02:41

@jlliagre: could you please assist me in the usage of studio dbx? i only have a tty in a restricted zone to work with and im not very familiar with the solaris environment. ty

noobroot 2010-09-28 03:13:43

done (see dedicated reply)

jlliagre 2010-09-28 09:17:14

@jlliagre: thnx a lot man, ill check that out.

noobroot 2010-09-28 17:32:08

Answer 4

A:

It's probably a processor dependent manifestation of the bug. The padding and alignment of data are both usually determined by the characteristics of the target processor. Extra padding bytes may be hiding the error on one target, while on another target there are fewer padding bytes.

It could also have to do with the actual data that strcat is being passed being different. For instance if part of the string that it was building were to consist of the name of the OS, processor, or the path of a system file.

It could also be that the strings were allocated with a size based on either a header constant or a sizeof that is different for the different targets, so your actual buffer is smaller.

One thing I might try would be to have the compilers produce preprocessed versions of the source files for each target and diff those. They will probably differ a lot in the initial header code, but should be mostly similar near the end where your code is. See if any suspicious differences are in that code. Sadly sizeof will not be replaced by numbers here, but macro constants will be.

nategoose 2010-09-23 19:14:17

i was reading something about this, but didnt came to anything useful though, but im sure this was due to my lack of knowledge in these OSes, anyway, i also think this (as Jonathan said) is a good guess, since the strings passed to strcat are the same.

noobroot 2010-09-23 20:08:51

@nategoose: could you provide some link or something to anything in particular regarding the data padding/alignment? i think that is the reason why it never happened on AIX but im having a hard time finding information thats not 10 years old

noobroot 2010-09-28 03:11:50

@noobroot: Different processors have different restrictions on the memory addresses that a variable can be at. For instance it is common for some processors to require that a 32 bit value be stored at an address which is divisible by 4 (since it is a 4 byte value) to accesses efficiently. On such a processor `struct {uint8_t a; uint32_t b;} foo;` might take up 8 bytes because a would be followed by 3 padding bytes. Similar things can happen for the layout of local variables, but the compiler has more freedom to reorder them than in structs.

nategoose 2010-09-28 15:25:01

Answer 5

A:

Here is how you can use Solaris Studio debugger to find memory access/usage errors:

1: Download studio
  http://www.oracle.com/technetwork/server-storage/solarisstudio/overview/index.html
2: Install it
  Uncompress/Extract the downloaded file somewhere in your disk
  Run the installer
3: Recompile your program with debugging on
  $ PATH=$PATH:/opt/<studio-installation-directory>/bin
  $ cc -g program.c
4: Launch your program under debugger control
  $ dbx program
5: Enable run time checking
  (dbx) check -all
  access checking - ON
  memuse checking - ON
6: Run your program
  (dbx) run
7: Watch all the error messages
  ...

jlliagre 2010-09-28 07:57:46

ansaurus

tags:

views:

answers:

AIX 5.3 vs Solaris 5.10 - C strcat implementation

Test code - with deliberate, intentional leak

related questions