views:

1892

answers:

11

For my Programming 102 class we are asked to deliver C code that compiles and runs under Linux. I don't have enough spare space on my hard drive to install Linux alongside Windows, and so I use cygwin to compile my programs.

The most recent program I had to give in compiles and runs fine under cygwin. It compiles fine under Linux, but half-way through execution produces a segmentation fault. I explained this to the grad student who gives us class and he said that cygwin's version of GCC allows for sloppier code to be compiled and executed.

The few references I have found via google haven't been conclusive. One thread I found said that the cause for the seg fault under Linux is a memory leak. Why would this not affect the cygwin version?

I would use the University's computers, but I can't use Subversion on them which would significantly hinder my efforts. (I'm new to coding and often need to be able to be able to revert to X revisions ago).

Is cygwin's version of GCC really more 'lax' with the code it compiles? If so, are there any obvious issues to look out for when coding? Are there any alternatives for being able to write code that will run under Linux?

Edit

Thanks for the replies. I wasn't explicit enough in my original post: that there is a bug in my code was pretty much a given for me (I am quite new to programming, and really green when it comes to C, after all). My TA implied cygwin's GCC is a less reliable compiler -allowing for much sloppier code to run- than the one found under GNU/Linux. I found this strange and so had a search on the internet, but couldn't really find any references to that fact.

More than blaming the compiler vs. my code, I was wondering what the reason could be for the program to run under Windows and crash under Linux. The replies re: different memory managers and heap/stack layout under Windows/Linux were illustrating in that regard.

Would the conclusion that cygwin's GCC is just as 'good' as GNU/Linux', and it's the underlying operating systems/sheer luck that my buggy program runs under one and not the other be pretty much correct?

Regarding posting the source code, it's a homework assignment so I'd prefer to find the issue myself if at all possible :)

Edit 2

I've accepted jalf's answer as it talks about what makes the program run under Windows and not under Linux, which was what I really wanted to know. Thanks to everyone else who contributed, they were all very interesting and informative replies.

When I've found the issue and fixed it I'll upload a zip file with all the source code of this non-working version, in case anyone is curious to see what the hell I did :)

Edit 3

For those interested in seeing the code, I found the problem, and it was indeed due to pointers. I was trying to return a pointer from a function. The pointer I was trying to return was being declared inside the function and so was being destroyed once the function executed. Problematic code is commented out on lines 22-24.

Feel free to ridicule my code.

/**
*  Returns array of valid searches based on current coordinate
*/
void determine_searches(int row, int col, int last_row, int last_col, int *active_search){
    // define coordinate categories and related valid search directions
    int Library0[] = {2, 3, 4, -1};
    int Library1[] = {4, 5, 6, -1};
    int Library2[] = {2, 3, 4, 5, 6, -1};
    int Library3[] = {0, 1, 2, 3, 4, 5, 6, 7, -1};
    int Library4[] = {0, 1, 2, -1};
    int Library5[] = {0, 6, 7, -1};
    int Library6[] = {0, 1, 2, 6, 7, -1};
    int Library7[] = {0, 1, 2, 3, 4, -1};
    int Library8[] = {0, 4, 5, 6, 7, -1};

    int * Library[] = { 
        Library0, Library1, Library2,
        Library3, Library4, Library5,
        Library6, Library7, Library8,
    };

    // declare (and assign memory to) the array of valid search directions that will be returned
    //int *active_search;
    //active_search = (int *) malloc(SEARCH_DIRECTIONS * sizeof(int));


    // determine which is the correct array of search directions based on the current coordinate
    // top left corner
     int i = 0;
    if(row == 0 && col == 0){
     while(Library[0][i] != -1){
      active_search[i] = Library[0][i];
      i++;
     }
    }
    // top right corner
    else if(row == 0 && col == last_col){
     while(Library[1][i] != -1){
      active_search[i] = Library[1][i];
      i++;
     }
    }
    // non-edge columns of first row
    else if(row == 0 && (col != 0 || col != last_col)){
     while(Library[2][i] != -1){
      active_search[i] = Library[2][i];
      i++;
     }
    }
    // non-edge coordinates (no edge columns nor rows)
    else if(row != 0 && row != last_row && col != 0 && col != last_col){
     while(Library[3][i] != -1){
      active_search[i] = Library[3][i];
      i++;
     }
    }
    // bottom left corner
    else if(row == last_row && col == 0){
     while(Library[4][i] != -1){
      active_search[i] = Library[4][i];
      i++;
     }
    }
    // bottom right corner
    else if(row == last_row && col == last_col){
     while(Library[5][i] != -1){
      active_search[i] = Library[5][i];
      i++;
     }
    }
    // non-edge columns of last row
    else if(row == last_row && (col != 0 || col != last_col)){
     while(Library[6][i] != -1){
      active_search[i] = Library[6][i];
      i++;
     }
    }
    // non-edge rows of first column
    else if((row != 0 || row != last_row) && col == 0){
     while(Library[7][i] != -1){
      active_search[i] = Library[7][i];
      i++;
     }
    }
    // non-edge rows of last column
    else if((row != 0 || row != last_row) && col == last_col){
     while(Library[8][i] != -1){
      active_search[i] = Library[8][i];
      i++;
     }
    }
    active_search[i] = -1;
}
+4  A: 

I haven't heard of anything specific about GCC weirdness under Cygwin but in your case it would probably be a good idea to use the -Wall command-line option to gcc to show all warnings, to see if it finds anything that might be causing the segfault in your code.

alxp
Thanks for the suggestion. I tried it on my code and all it brought up was a warning about an unused variable (not in use after trying to fix this issue).
bob esponja
+11  A: 

I don't mean to sound rude, but it's probably your code that's bad, not the compiler. ;) Problems like this are actually more common than you'd think, because different OS's and compilers will have different ways of organizing your application's data in the stack and heap. The former can be particularly problematic, especially if you end up overwriting memory on the stack, or referencing freed memory which the system has decided to use for something else. So basically, you might get away with it sometimes, but other times your app will choke and die. Either way, if it segfaults, it's because you tried to reference memory which you were not allowed, so it's more of a "happy coincidence" that it didn't crash under another system/compiler.

But really, a segfault is a segfault, so you should instead debug your code looking for memory corruption instead of tweaking the compiler's configuration to figure out what's going wrong.

Edit: Ok, I see what you mean now... I thought you were coming at this problem with an "X sucks, but Y works just fine!" attitude, but it seems to be your TA who's got that. ;)

Anyways, here's some more hints for debugging problems like this:

  • Look for pointer arithmetic, referencing/dereferencing for possible "doh!" errors. Any place where you are adding/subtracting one (aka, fencepost errors) are particularly suspect.
  • Comment out calls to malloc/free around the problem area, and any associated areas where those pointers are used. If the code stops crashing, then you're headed in the right direction.
  • Assuming you've at least identified the general area where your code is crashing, insert early return statements in there and find the point where your code doesn't crash. This can help to find an area somewhere between that point and where your code actually crashes. Remember, a segfault like this may not necessarily happen directly at the line of code where your bug is.
  • Use the memory debugging tools available on your system.
    • On Unix, check out this guide for debugging memory on unix, and the valgrind profiler (@Sol, thx for reminding me about this one)
    • On Visual Studio/Windows, your good 'ol buddy CrtCheckMemory() comes in rather handy. Also, read up on the CRT memory debugging patterns, as they're one of the nicer features of working in VS. Often times, just leaving open a memory tab in VS is enough to diagnose bugs like this once you memorize the various patterns.
    • In Mac OSX, you can set a breakpoint on malloc_error_break (either from gdb or Xcode), which causes it the debugger to break whenever malloc detects memory corruption. I'm not sure whether that's available in other unix flavors, but a quick google search seems to indicate it's mac-only. Also, a rather "experimental" looking version of valgrind seems to exist for OSX.
Nik Reiman
But if he can tweak the compiler options so his program will crash on his dev box the same as it does at school, then he'll find it much easier to debug.
Rob Kennedy
Perhaps. In my experience, though, it's much faster to start looking backwards from the point of the segfault, particularly at the pointers and memory allocations. Getting multiple systems involved will be more complicated since he'd be concentrating on the differences between them, not his bug
Nik Reiman
Not rude at all, it's what I had imagined was the case anyway :) I'll definitely look into the pointers I'm using, thanks for the reply.@Rob: this is pretty much the solution I'd be looking for. Doesn't seem possible though.
bob esponja
@sqook: that's a good point you bring up, maybe the solution would be to somehow get a Linux box I can SSH into and compile/execute on it, instead of trying to use cygwin/Windows.
bob esponja
Well, I meant that you shouldn't concentrate on the differences between cygwin/linux, but you're right.. it will definitely be easier to fix this bug on a system where you can reproduce it. Not having SVN may suck (btw, can't you build it there?), but you can fix this bug without it. Don't give up!
Nik Reiman
If 'building there' means on the uni computers, yes I can, and do. The problem is I then try and edit the source with vim: 'oh I bet its just this little change here that will fix it' x 100, and I end up with a horribly broken source file with no mid-way revisions to compare/revert to :P
bob esponja
I meant building svn there. As in, "./configure --prefix=~/local" so it's under your home directory. Then you just need to add "$HOME/local/bin" to your $PATH, and you're home free.
Nik Reiman
The issue with subversion is actually that the 'communications' department has blocked subversion connections off at the firewall/proxy and the 'computer lab' department can't do anything about it :( I didn't want to overload my original post with unnecessary details.
bob esponja
Also thanks for the extra information you added, I'll have a look at it tomorrow. Time for bed now :)
bob esponja
Bah, how frustrating... "communications" indeed! Not to go OT, but you could serve your repos over HTTP (visualsvn.com), or use a free SVN hosting service (beanstalkapp.com). Anyways, that's probably a good topic for another question. ;)
Nik Reiman
"Anyone know of any?" valgrind on Linux is the best heap debugging tool I've ever used. It's a standard component in my development process. (I get my latest code out of SVN, build on the Linux box, run valgrind there, and take its output back to my development platform.) It's invaluable.
Sol
+1  A: 

The version of GCC is probably not the issue. It's more likely to be a difference in the runtime library and a bug in your code that doesn't manifest itself when running against the Windows version of the runtime. You might want to post the code that segfaults and some more background information if you want a more specific answer.

In general, it's best to develop under the environment you're going to use for running your code.

Ori Pessach
+2  A: 

Cygwin's version of gcc may have other default flags and tweaked settings (wchar_t being 2 bytes for example), but i doubt it is specifically more "lax" with code and even so - your code should not crash. If it does, then most probably there is a bug in your code that needs be fixed. For example your code may depend on a particular size of wchar_t or may execute code that's not guaranteed to work at all, like writing into string literals.

If you write clean code then it runs also on linux. I'm currently running firefox and the KDE desktop which together consist of millions of C++ lines, and i don't see those apps crashing :)

I recommend you to paste your code into your question, so we can look what is going wrong.

In the meantime, you can run your program in gdb, which is a debugger for linux. You can also compile with all mudflap checks enabled and with all warnings enabled. mudflaps checks your code at runtime for various violations:

[js@HOST2 cpp]$ cat mudf.cpp
int main(void)
{
  int a[10];
  a[10] = 3;  // oops, off by one.
  return 0;
}
[js@HOST2 cpp]$ g++ -fmudflap -fstack-protector-all -lmudflap -Wall mudf.cpp
[js@HOST2 cpp]$ MUDFLAP_OPTIONS=-help ./a.out
  ... showing many options ...
[js@HOST2 cpp]$ ./a.out 
*******                 
mudflap violation 1 (check/write): time=1234225118.232529 ptr=0xbf98af84 size=44
pc=0xb7f6026d location=`mudf.cpp:4:12 (main)'                                   
      /usr/lib/libmudflap.so.0(__mf_check+0x3d) [0xb7f6026d]                    
      ./a.out(main+0xb9) [0x804892d]                                            
      /usr/lib/libmudflap.so.0(__wrap_main+0x4f) [0xb7f5fa5f]                   
Nearby object 1: checked region begins 0B into and ends 4B after                
mudflap object 0x9731f20: name=`mudf.cpp:3:11 (main) int a [10]'                
bounds=[0xbf98af84,0xbf98afab] size=40 area=stack check=0r/3w liveness=3        
alloc time=1234225118.232519 pc=0xb7f5f9fd                                      
number of nearby objects: 1                                                     
*** stack smashing detected ***: ./a.out terminated                             
======= Backtrace: =========
....

There are many mudflap checks you can do, and the above runs a.out using the default options. Another tools which helps for those kind of bugs is valgrind, which can also help you find leaks or off by one bugs like above. Setting the environment variable "MALLOC_CHECK_" to 1 will print messages for violations too. See the manpage of malloc for other possible values for that variable.

For checking where your program crashes you can use gdb:

[js@HOST2 cpp]$ cat test.cpp
int main() {
    int *p = 0;
    *p = 0;
}
[js@HOST2 cpp]$ g++ -g3 -Wall test.cpp
[js@HOST2 cpp]$ gdb ./a.out
...
(gdb) r
Starting program: /home/js/cpp/a.out

Program received signal SIGSEGV, Segmentation fault.
0x080483df in main () at test.cpp:3
3           *p = 0;
(gdb) bt
#0  0x080483df in main () at test.cpp:3
(gdb)

Compile your code with -g3 to include many debugging information, so gdb can help you find the precise lines where your program is crashing. All the above techniques are equally applicable for C and C++.

Johannes Schaub - litb
Non-crashing code was the aim :) I would have liked a way to make my program act the same under Windows as Linux, but SSH into a Linux box for running the code is probably the way. PS I see your millions of lines of non-crashing C++ and raise you my 1,000 lines of C which crash miserably :(
bob esponja
+3  A: 

You definitely have a bug somewhere in your code. It's possible that the Windows memory manager is being more lax than the Linux memory manager. On Windows, you might be doing bad things with memory (like overwriting array bounds, memory leaks, double-free'ing, etc.), but it's letting you get away with it. A famous story related to this can be found at http://www.joelonsoftware.com/articles/APIWar.html (search for "SimCity" on that (somewhat lengthy) article).

Adam Rosenfield
Thanks for the reply and the link. The article rings a bell, I think I read it a few years ago. I'll definitely be looking into the memory usage.
bob esponja
+1  A: 

Are you making any platform-specific assumptions, like the size of data types, data structure alignment in structs, or endianness?

Zach Scrivena
or whether you use a slash or backslash for directory paths?
rmeador
I don't think the program I'm writing is complex enough to be running into endianness issues (no bitwise operations). I'm not using many advanced data types, the most is a pointer to structure, and I allocate memory to it with malloc(sizeof(StructType)), which is correct I think.
bob esponja
@rmeador: I'm using slashes for the few directory paths, which is native for Linux (right?) and on Windows doesn't cause me problems.
bob esponja
@bob: With structs, you might also need to look out for byte alignment issues. (updated answer to include this)
Zach Scrivena
The pointer to struct I'm using has 3 members, 2 members that are integers holding number of rows and columns, and a 3rd member which is a double pointer to int (to mimic a 2d array). Would a structure this simple have byte alignment issues?
bob esponja
@bob: Depends. Pointers have different sizes on 32-bit vs 64-bit machines. This might also affect unions within structs and vice versa.
Zach Scrivena
+2  A: 

It's almost certainly a pointer error or buffer overrun, maybe an uninitialised variable.

An uninitialised pointer will usually point at nothing, but sometimes it will point at something; reading from it or writing to it will typically crash the program, but then again it MIGHT not.

Writing or reading from freed memory is the same story; you might get away with it, then again maybe not.

These situations depend on exactly how the stack, heap are laid out and what the runtime is doing. It is quite possible to make a bad program that works on one compiler / runtime combination and not another, simply because on one it overwrites something that doesn't matter (so much), or that an uninitialised variable "happens" to contain a valid value for the context it's used in.

MarkR
Thanks, this is the type of explanation I was looking for, and confirms my thoughts.
bob esponja
Windows C runtime does almost the opposite of a memory bounds detector. It allocates pages quite generously. The Linux C runtime is a lot less so inclined. Rather a good feature really.
Tim Williscroft
+1  A: 

A segmentation fault means that you tried to access memory you couldn't, which usually means you tried dereferencing a null pointer or you double-deleted memory or got a wild pointer. There's two reasons why you might have appeared to be fine on cygwin and not on Linux: either it was an issue with the memory managers, or you got more lucky on one of them than the other. It is almost certainly an error with your code.

To fix this, look at your pointer use. Consider substituting smart pointers for raw pointers. Consider doing a search for delete and zeroing the pointer immediately afterwards (it's safe to try to delete a null pointer). If you can get a crack at Linux, try getting a stack trace through gdb and see if there's anything obviously wrong at the line it happens. Examine how you use all pointers that aren't initialized. If you have access to a memory debugging tool, use it.

David Thornley
Will look into the pointer usage, thanks. Is getting a stack trace advanced use of gdb, or will it be explained in whatever tutorial/resources I find on debugging with gdb?
bob esponja
It's pretty elementary. Once you've got gdb up on the crash, type "bt" for backtrace. It's been a while since I used it, but IIRC the function that crashed is on top, the one that called it is just down from it, etc.
David Thornley
+1  A: 

Some hints:

  1. Post your code. I'll bet you will get some good input that will make you a better programmer.

  2. Turn on warnings with the -wall option and correct any problems that are reported. Again, it can help make you a better programmer.

  3. Step through the code with a debugger. Besides helping you understand where the problem is, it will help make you a better programmer.

  4. Continue to use Subversion or other source code control system.

  5. Never blame the compiler (or OS or hardware) until you are sure you've pinpointed the problem. Even then, be suspicious of your own code.


GCC on Linux is source-code identical to GCC on Cygwin. Differences between the platforms exist occur because of the Cygwin POSIX emulation layer and the underlying Windows API. It's possible the extra layers are more forgiving than the underlying hardware, but that's not be counted on.

Since it's homework, I'd say posting code is an even better idea. What better way to learn than getting input from professional programmers? I'd recommend crediting any suggestions you implement in nearby comments, however.

Jon Ericson
Thanks for the suggestion re: posting code, I'd definitely appreciate input on how to improve my coding, but I'll wait until I have a personal project I can post the code of; this is a homework assignment :) I'll read up on how debugging works, I tried it at uni but with little results.
bob esponja
+4  A: 

Like others have said, you might want to post some of your code here, even if that's not the real point of your question. It might still be a good learning experience to have everyone here poke through your code and see if they can find what caused the segfault.

But yeah, the problem is that there are so many platform-dependent, as well as basically random, factors influencing a C program. Virtual memory means that sometimes, accessing unallocated memory will seem to work, because you hit an unused part of a page that's been allocated at some earlier point. Other times, it'll segfault because you hit a page that hasn't been allocated to your process at all. And that is really impossible to predict. It depends on where your memory was allocated, was it at the edge of a page, or in the middle? That's up to the OS and the memory manager, and which pages have been allocated so far, and...... You get the idea. Different compilers, different versions of the same compilers, different OS'es, different software, drivers or hardware installed on the system, anything can change whether or not you get a segfault when you access unallocated memory.

As for the TA's claim that cygwin is more "lax", that's rubbish, for one simple reason. Neither compiler caught the bug! If the "native" GCC compiler had truly been less lax, it would have given you an error at compile-time. Segfaults are not generated by the compiler. There's not much the compiler can do to ensure you get a segfault instead of a program that seemingly works.

jalf
+1  A: 

A segmentation fault is the result of accessing memory at a non-existent (or previously freed) address. What I find very interesting is that the code did NOT segfault under cygwin. That could mean that your program used a wild pointer to some other processes' address space and was actually able to read it (gasp), or (more likely) the code that actually caused the segfault was not reached until the program was run under Linux.

I recommend the following:

  1. Paste your code as it is a very interesting problem
  2. Send a copy of this to the cygwin developers
  3. Get a cheap Linux VPS if you'll be required to produce more programs that run under Linux, it will make your life much easier.

Once your working under Linux (i.e. shelled into your VPS), try working with the following programs:

  • GDB
  • Valgrind
  • strace

Also, you can try libraries like electric fence to catch these kinds of things as they happen while your program is running.

Finally, make sure -Wall is passed to gcc, you want the warnings it would convey.

Tim Post