views:

712

answers:

3

We have a native C++ application running via COM+ on a windows 2003 server. I've recently noticed from the event viewer that its throwing exceptions, specifically the C0000005 exception, which, according to http://blogs.msdn.com/calvin_hsia/archive/2004/06/30/170344.aspx means that the process is trying to write to memory not within its address space, aka access violation.

The entry in event viewer provides a call stack:

LibFmwk!UTIL_GetDateFromLogByDayDirectory(char const *,class utilCDate &) + 0xa26c LibFmwk!UTIL_GetDateFromLogByDayDirectory(char const *,class utilCDate &) + 0x8af4 LibFmwk!UTIL_GetDateFromLogByDayDirectory(char const *,class utilCDate &) + 0x13a1 LibFmwk!utilCLogController::GetFLFInfoLevel(void)const + 0x1070 LibFmwk!utilCLogController::GetFLFInfoLevel(void)const + 0x186

Now, I understand that its giving me method names to go look at but I get a feeling that the address at the end of each line (e.g. + 0xa26c) is trying to point me to a specific line or instruction within that method.

So my questions are:

  1. Does anyone know how I might use this address or any other information in a call stack to determine which line within the code its falling over on?
  2. Are there any resources out there that I could read to better understand call stacks,
  3. Are there any freeware/opensource tools that could help in analysing a call stack, perhaps by attaching to a debug symbol file and/or binaries?

Edit: As requested, here is the method that appears to be causing the problem:

BOOL UTIL_GetDateFromLogByDayDirectory(LPCSTR pszDir, utilCDate& oDate)
{
BOOL bRet = FALSE;

if ((pszDir[0] == '%') &&
 ::isdigit(pszDir[1]) && ::isdigit(pszDir[2]) &&
 ::isdigit(pszDir[3]) && ::isdigit(pszDir[4]) &&
 ::isdigit(pszDir[5]) && ::isdigit(pszDir[6]) &&
 ::isdigit(pszDir[7]) && ::isdigit(pszDir[8]) &&
 !pszDir[9])
{
 char acCopy[9];
 ::memcpy(acCopy, pszDir + 1, 8);
 acCopy[8] = '\0';

 int iDay = ::atoi(&acCopy[6]);
 acCopy[6] = '\0';
 int iMonth = ::atoi(&acCopy[4]);
 acCopy[4] = '\0';
 int iYear = ::atoi(&acCopy[0]);

 oDate.Set(iDay, iMonth, iYear);

 bRet = TRUE;
}

return (bRet);

}

This is code written over 10 years ago by a member of our company who has long since gone, so I don't presume to know exactly what this is doing but I do know its involved in the process of renaming a log directory from 'Today' to the specific date, e.g. %20090329. The array indexing, memcpy and address of operators do make it look rather suspicious.

Another problem we seem to have is that this only happens on the production system, we've never been able to reproduce it on our test systems or development systems here, which would allow us to attach a debugger.

Much appreciated! Andy

+1  A: 

Point 2 and 3 are easily answered:

3rd Point. Any debugger. That's what they are made for. Set your debugger to break on this special exception. You should be able to click yourself through the callstack and find the different calls on the stack (at least delphi can do this, so visual studio should be able as well). Compile without optimisations if possible. OllyDBG might work as well - perhaps in combination with its trace functionality.

2nd Point. Any information about x86 Assembler, Reverseengineering ... Try: OpenRCE, NASM Documentation, ASM Community.

1st Point. The callstack tells you the functions. I don't know if it is written in order or in opposite order - so it might be that the first line is the last called function or the first called function. Follow the calls with the help of the debugger. Sometimes you can change between asm and code (depending on the debugger, map files ...). If you don't have the source - learn assembler, read about reverse engineering. Read the documentation of the functions you call in third party components. Perhaps you do not satisfy a precondition.

If you can tell a bit more about the programm (which parts of the source code do you have, is a library call involved?, ...)


Now some code-reading:

The function accepts a pointer to a zero terminated string and a reference to a date object. The pointer is assumed to be valid!

The function checks wether the string is in a specific format (% followed by 8 digits followed by a \0). If this is not the case, it returns false. This check (the big if) accesses the pointer without any validity checks. The length is not checked and if the pointer is pointing somewhere in the wild, this space is accessed. I don't know if a shorter string will cause problems. It shouldn't because of the way && is evaluated.

Then some memory is allocated on the stack. The number-part of the string is copied into it (which is ok) and the buffer gets its \0 termination. The atois extract the numbers. This will work because of the different start-locations used and the \0-termination after each part. Somehow tricky but nice. Some comments would have made everything clear.

These numbers are then inserted into the object. It should be valid since it is passed into the function by reference. I don't know if you can pass a reference to a deleted object but if this is the case, this might be your problem as well.

Anyway - except the missing check for the string-pointer, this function is sound and not the cause of your problem. It's only the place that throws the exception. Search for arguments that are passed into this function. Are they always valid? Do some logging.

I hope I didn't do any major mistakes as I am a Delphi programmer. If I did - feel free to comment.

Tobias Langner
Thanks for looking at the code, I would agree with you in so far as there is nothing within the method which seems like it would cause an access violation assuming the passed params are pointing at the correct area of memory. Thats something I need to verify. I wish I could set both these responses as the right answer.
A. Murray
np - I'm here to help, not for the points.
Tobias Langner
+1  A: 

if you really need to map those addresses to your functions - you'll need to work with .MAP file and see where those addresses really point to.

But being in your situation I would rather investigate this problem under debugger (e.g. MSVS debugger or windbg); as alternative (if crash happens at customer's site) you can generate crash dump and study it locally - it can be done via Windows MiniDumpWriteDump API or SysInternals ProcDump utility (http://download.sysinternals.com/Files/procdump.zip).

Make sure that all required symbol files are generated and available (also setup microsoft symbol server path so that windows DLLs' entry points get resolved also).

IMHO this is just the web site you need: http://www.dumpanalysis.org - which is the best resource to cover all your questions. Consider also taking a look at this PDF - http://windbg.info/download/doc/pdf/WinDbg_A_to_Z_color.pdf

Andrey
+3  A: 

others have said this in-between the lines, but not explicitly. look at:

LibFmwk!UTIL_GetDateFromLogByDayDirectory(char const *,class utilCDate &) + 0xa26c

the 0xa26c offset is huge, way past the end of the function. the debugger obviously doesn't have the proper symbols for LibFmwk so instead it's relying on the DLL exports and showing the offset relative to the closest one it can find.

so, yeah, get proper symbols and then it should be a breeze. :-) UTIL_GetDateFromLogByDayDirectory is not at fault here.

~jewels

Jewel S