tags:

views:

325

answers:

4

I have a program that works correctly on my development machine but produces an Illegal instruction fault when tested on a 'clean machine' where only the necessary files have been copied.

The program consists of my shared library, built from C++ sources and a C wrapper sample program that demonstrates the libraries usage. On the development machine, all are built in Eclipse w/g++ and both Debug and Release work fine. A number of standard libraries are linked in.

To test dependencies that I might have missed, I copied the .c file, my library's .so file and the library .h file to a fresh Linux install and compiled/linked them with a simple script created with the same release compile options that Eclipse is using. Both machines have g++ 4.3.2.

When I run the program on the clean machine it exits immediately after printing 'Illegal instruction'.

Running in gdb produces:

(gdb) run
Starting program: /home/sfallows/Source/Apps/MySample/MySample 
[Thread debugging using libthread_db enabled]
[New Thread 0xb5c4ca90 (LWP 7063)]

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 0xb5c4ca90 (LWP 7063)]
0xb7f0cb29 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /usr/include/c++/4.3/iostream:77
77   static ios_base::Init __ioinit;
Current language:  auto; currently c++
(gdb) bt
#0  0xb7f0cb29 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /usr/include/c++/4.3/iostream:77
#1  0xb7f0cb48 in global constructors keyed to _ZN8NodeLockC2Ev () at ../NodeLock.cpp:194
#2  0xb7f204ad in __do_global_ctors_aux () from /home/sfallows/Source/Apps/MySample/libMyLib.so
#3  0xb7ee5c80 in _init () from /home/sfallows/Source/Apps/MySample/libMyLib.so
#4  0xb7fe1de4 in ?? () from /lib/ld-linux.so.2
#5  0x00000001 in ?? ()
#6  0xbf8e6b74 in ?? ()
#7  0xbf8e6b7c in ?? ()
#8  0x00000007 in ?? ()
#9  0xbf8e6b2c in ?? ()
#10 0x00000001 in ?? ()
#11 0x00000001 in ?? ()
#12 0xb7feeff4 in ?? () from /lib/ld-linux.so.2
#13 0x00000000 in ?? ()
(gdb) Quit

I'm not sure why it is running static construtors in NodeLock.cpp. I have neither any static/global objects in that file nor any static/global objects of that class anywhere.

The development machine is an Intel Core2 Quad and the clean machine is a Pentium 4 Dual. I assume g++ defaults to using a common subset of x86 instructions and that the processor difference is not my problem.

Any suggestions for what else to look at appreciated. I'm trying to avoid installing all of the library source and dependencies on the clean machine.

To rmn's answer and John Boker's comment: In the Windows world exes and dlls run on the plethora of Intel and AMD processors so there clearly is a widely used common subset of instructions. I thought gcc would do the same? Guess I'll fully research the instruction set/architecture options.

+1  A: 

Quoting: "The development machine is an Intel Core2 Quad and the clean machine is a Pentium 4 Dual. I assume g++ defaults to using a common subset of x86 instructions and that the processor difference is not my problem."

I do think that this is a problem.. Try either compiling specifically for that machine or re-compile on the clean-machine, or get two identical machines. - These are my 0.02$.

Also, it looks like you're dying on the loading of the ld-linux.so. Perhaps the linux versions differ?

rmn
Both machines are Debian 5.03 installed from the same DVD.
Steve Fallows
+2  A: 

You could try to compile explicitly for the i686 architecture (using -march=i686 option for gcc). Just in case you have some Core2-Specific instructions generated by the your compiler...

MartinStettner
This solved the problem.
Steve Fallows
+1  A: 

I can think of a couple of things you can try:

  1. Are you using any optimization beyond -O3? Maybe the old system doesn't support it.
  2. You've probably already checked this, but have you checked the md5 of the binary on your system vs. the target system?
  3. Is your library doing any multi-threading? If so, since the # of cores are different, then perhaps you've got a race condition someplace
Levon Karayan
+1  A: 

you are dying in static initialization. The order that things get done is implementation specific and can vary based on the runtime version. This might be your problem. Is it the same libstdc++ on both machines?

Its bad to have cross dependencies there anyway, you need to refector the code if that turns out to be the issue

pm100