views:

459

answers:

6

Hi there,

I'm pretty inexperienced using C++, but I'm trying to compile version 2.0.2 of the SBML toolbox for matlab on a 64-bit XP platform. The SBML toolbox depends upon Xerces 2.8 and libsbml 2.3.5.

I've been able to build and compile the toolbox on a 32-bit machine, and it works when I test it. However, after rebuilding it on a 64-bit machine (which is a HUGE PITA!), I get a segmentation fault when I try to read long .xml files with it.

I suspect that the issue is caused by pointer addresses issues.

The Stack Trace from the the segmentation fault starts with:

[ 0] 000000003CB3856E libsbml.dll+165230 (StringBuffer_append+000030)
[ 6] 000000003CB1BFAF libsbml.dll+049071 (EventAssignment_createWith+001631)
[ 12] 000000003CB1C1D7 libsbml.dll+049623 (SBML_formulaToString+000039)
[ 18] 000000003CB2C154 libsbml.dll+115028 (

So I'm looking at the StringBuffer_append function in the libsbml code:

LIBSBML_EXTERN
void
StringBuffer_append (StringBuffer_t *sb, const char *s)
{
  unsigned long len = strlen(s);


  StringBuffer_ensureCapacity(sb, len);

  strncpy(sb->buffer + sb->length, s, len + 1);
  sb->length += len;
}

ensureCapacity looks like this:

LIBSBML_EXTERN
void
StringBuffer_ensureCapacity (StringBuffer_t *sb, unsigned long n)
{
  unsigned long wanted = sb->length + n;
  unsigned long c;


  if (wanted > sb->capacity)
  {
    /**
     * Double the total new capacity (c) until it is greater-than wanted.
     * Grow StringBuffer by this amount minus the current capacity.
     */
    for (c = 2 * sb->capacity; c < wanted; c *= 2) ;
    StringBuffer_grow(sb, c - sb->capacity);
  }                   
}

and StringBuffer_grow looks like this:

LIBSBML_EXTERN
void
StringBuffer_grow (StringBuffer_t *sb, unsigned long n)
{
  sb->capacity += n;
  sb->buffer    = (char *) safe_realloc(sb->buffer, sb->capacity + 1);
}

Is it likely that the

strncpy(sb->buffer + sb->length, s, len + 1);

in StringBuffer_append is the source of my segfault?

If so, can anyone suggest a fix? I really don't know C++, and am particularly confused by pointers and memory addressing, so am likely to have no idea what you're talking about - I'll need some hand-holding.

Also, I put details of my build process online here, in case anyone else is dealing with trying to compile C++ for 64-bit systems using Microsoft Visual C++ Express Edition.

Thanks in advance!

-Ben

A: 

The problem could be pretty much anything. True, it might be that strncpy does something bad, but most likely, it is simply passed a bad pointer. Which could originate from anywhere. A segfault (or access violation in Windows) simply means that the application tried to read or write to an address it did not have permission to access. So the real question is, where did that address come from? The function that tried to follow the pointer is probably ok. But it was passed a bad pointer from somewhere else. Probably.

Unfortunately, debugging C code is not trivial at the best of time. If the code isn't your own, that doesn't make it easier. :)

jalf
Thanks jalf - I'm trying to track down the bad pointer that strncpy gets. Since the segfault happens when the toolbox is reading a large .xml file (but not a small test file), I suspect it's from the StringBuffer_grow function. But I don't know C++ well enough to recognize any problems with it. -b
Of course, with my lack of knowledge, it could also be an issue with the `unsigned long len = strlen(s);` in StringBuffer_append, or pretty much anything else. :( StringBuffer_ensureCapacity(sb, len);
+1  A: 

Try printing or using a debugger to see what values your getting for some of your intermediate variables. In StringBuffer_append() O/P len, in StringBuffer_ensureCapacity() observe sb->capacity and c before and in the loop. See if the values make sense.

A segmentation fault may be caused by accessing data beyond the end of the string.

The strange fact that it worked on a 32-bit machine and not a 64-bit O/S is also a clue. Is the physical and pagefile memory size the same for the two machines? Also, in a 64-bit machine the kernel space may be larger than the 32-bit machine, and eating some available memory space that was in the user part of the memory space for 32-bit O/S. For XML the entire document must fit into memory. There are probably some switches to set the size if this is the problem. The difference in machines being the cause of the problem should only be the case if you are working with a very large string. If the string is not huge, it may be some problem with library or utility method that doesn't work well in a 64-bit environment.

Also, use a simple/small xml file to start with if you have nothing else to try.

Where do you initialize sb->length. Your problem is likely in strncpy(), though I don't know why the 32bit -> 64-bit O/S change would matter. Best bet is looking at the intermediate values and your problem will then be obvious.

jeffD
I guess "big xml file" is relative - in this case, it's about 1.5 MB. There's 16 GB of RAM in this particular Win XP x64 box, so it does seem that the issue is from the code. I'll try a debugger, but won't know if the values make sense, so may be back soon!
Alternatively, the code might try storing a pointer in an unsigned long or something, a practice that was much more prevalent a long time ago. If the pointer is zeros in the top half, or maybe Fs, that could be the cause.
David Thornley
A: 

StringBuffer is defined as follows:

/**
 * Creates a new StringBuffer and returns a pointer to it.
 */
LIBSBML_EXTERN
StringBuffer_t *
StringBuffer_create (unsigned long capacity)
{
  StringBuffer_t *sb;


  sb           = (StringBuffer_t *) safe_malloc(sizeof(StringBuffer_t));
  sb->buffer   = (char *)           safe_malloc(capacity + 1);
  sb->capacity = capacity;

  StringBuffer_reset(sb);

  return sb;
}

A bit more of the stack trace is:

[  0] 000000003CB3856E              libsbml.dll+165230 (StringBuffer_append+000030)
[  6] 000000003CB1BFAF              libsbml.dll+049071 (EventAssignment_createWith+001631)
[ 12] 000000003CB1C1D7              libsbml.dll+049623 (SBML_formulaToString+000039)
[ 18] 000000003CB2C154              libsbml.dll+115028 (Rule::setFormulaFromMath+000036)
[ 20] 0000000001751913              libmx.dll+137491 (mxCheckMN_700+000723)
[ 25] 000000003CB1E7B2              libsbml.dll+059314 (KineticLaw_getFormula+000018)
[ 37] 0000000035727749              TranslateSBML.mexw64+030537 (mexFunction+009353)
A: 

Seems if it was in any of the StringBuffer_* functions, that would be in the stack trace. I disagree with how _ensureCapacity and _grow are implemented. None of the functions check if realloc works or not. Realloc failure will certainly cause a segfault. I would insert a check for null after _ensureCapacity. With the way _ensureCapacity and _grow are, it seems possible to get an off-by-one error. If you're running on Windows, the 64-bit and 32-bit systems may have different page protection mechanisms that cause it to fail. (You can often live through off-by-one errors in malloc'ed memory on systems with weak page protection.)

A: 

Let's assume that safe_malloc and safe_realloc do something sensible like aborting the program when they can't get the requested memory. That way your program won't continue executing with invalid pointers.

Now let's look at how big StringBuffer_ensureCapacity grows the buffer to, in comparison to the wanted capacity. It's not an off-by-one error. It's an off-by-a-factor-of-two error.

How did your program ever work in x32, I can't guess.

Windows programmer
Sorry, my answer is wrong. I missed the semicolon at the end of this line: "for (c = 2 * sb->capacity; c < wanted; c *= 2) ;" Gee, how could I make a mistake like that.
Windows programmer
A: 

In response to bk1e's comment on the question - unfortunately, I need version 2.0.2 for use with the COBRA toolbox, which doesn't work with the newer version 3. So, I'm stuck with this older version for now.

I'm also hitting some walls attempting to debug - I'm building a .dll, so in addition to recompiling xerces to make sure it has the same debugging settings in MSVC++, I also need to attach to the Matlab process to do the debugging - it's a pretty big jump for my limited experience in this environment, and I haven't dug into it very far yet.

I had hoped there was some obvious syntax issue with the buffer allocation or expansion. Looks like I'm in for a few more days of pain, though.

Version 2.0.2 is browsable online at Sourceforge: http://sbml.svn.sourceforge.net/viewvc/sbml/tags/rel-2-2-0/libsbml/src/ . The code I've been looking at is in StringBuffer.c