views:

853

answers:

3

I have a shared library (namely libXXX.so) with a cpp/h file associated. They contains a number of function pointers ( to point to .so function entrypoint) and a class to wrap this functions as methods of the said class.

ie: .h file:

typedef void* handle;
/* wrapper functions */
handle okUsbFrontPanel_Construct();
void okUsbFrontPanel_Destruct(handle hnd);

/* wrapper class */
class okCUsbFrontPanel
{
public:
  handle h;
public:
  okCUsbFrontPanel();
  ~okCUsbFrontPanel();
};

.cpp file

/* class methods */
okCUsbFrontPanel::okCUsbFrontPanel()
  { h=okUsbFrontPanel_Construct(); }
okCUsbFrontPanel::~okCUsbFrontPanel()
  { okUsbFrontPanel_Destruct(h); }
/* function pointers */
typedef handle  (*OKUSBFRONTPANEL_CONSTRUCT_FN) (void);
typedef void    (*OKUSBFRONTPANEL_DESTRUCT_FN) (handle);
OKUSBFRONTPANEL_CONSTRUCT_FN    _okUsbFrontPanel_Construct = NULL;
OKUSBFRONTPANEL_DESTRUCT_FN _okUsbFrontPanel_Destruct = NULL;
/* load lib function */
Bool LoadLib(char *libname){
  void *hLib = dlopen(libname, RTLD_NOW);
  if(hLib){
    _okUsbFrontPanel_Construct = ( OKUSBFRONTPANEL_CONSTRUCT_FN ) dlsym(hLib, "okUsbFrontPanel_Construct");
    _okUsbFrontPanel_Destruct = ( OKUSBFRONTPANEL_DESTRUCT_FN ) dlsym( hLib, "okUsbFrontPanel_Destruct" );
  }
}
/* construct function */
handle okUsbFrontPanel_Construct(){
  if (_okUsbFrontPanel_Construct){
    handle h = (*_okUsbFrontPanel_Construct)(); //calls fuction pointer
    return h;
  }
  return(NULL);
}

void okUsbFrontPanel_Destruct(handle hnd)
{
  if (_okUsbFrontPanel_Destruct)
    (*_okUsbFrontPanel_Destruct)(hnd);
}

Then I have another shared library (made by myself) which calls:

LoadLib("libXXX.so");
okCusbFrontPanel *device = new okCusbFrontPanel();

resulting in a Segmentation fault. The segmentation fault seems to happen at

handle h = (*_okUsbFrontPanel_Construct)();

but with a strange behaviour: once I reach

(*_okUsbFrontPanel_Construct)();

I get a recursion to okUsbFrontPanel_Construct().

Does anyone have any idea?

EDIT: here is a backtrace obtained by a run with gdb.

#0  0x007590b0 in _IO_new_do_write () from /lib/tls/libc.so.6
#1  0x00759bb8 in _IO_new_file_overflow () from /lib/tls/libc.so.6
#2  0x0075a83d in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#3  0x00736db7 in vfprintf () from /lib/tls/libc.so.6
#4  0x0073ecd0 in printf () from /lib/tls/libc.so.6
#5  0x02cb68ca in okCUsbFrontPanel (this=0x9d0ae28) at okFrontPanelDLL.cpp:167
#6  0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#7  0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#8  0x02cb68db in okCUsbFrontPanel (this=0x9d0ade8) at okFrontPanelDLL.cpp:169
#9  0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#10 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#11 0x02cb68db in okCUsbFrontPanel (this=0x9d0ada8) at okFrontPanelDLL.cpp:169
#12 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#13 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107

and so on... IMHO I get a seg fault becouse of a sort of stack overflow. There are too many recursive call and something goes wrong..

By the way I'm on a Scientific Linux 4 distro (based on RH4).

EDIT2:

an objdump of libokFrontPanel.so for function okUsbFrontPanel_Construct outputs:

00009316 <okUsbFrontPanel_Construct>:
9316:   55                    push   ebp  
9317:   89 e5                 mov    ebp,esp
9319:   56                    push   esi
931a:   53                    push   ebx
931b:   83 ec 30              sub    esp,0x30
931e:   e8 44 f4 ff ff        call   8767 <__i686.get_pc_thunk.bx>
9323:   81 c3 dd bd 00 00     add    ebx,0xbddd
9329:   c7 04 24 38 00 00 00  mov    DWORD PTR [esp],0x38
9330:   e8 93 ec ff ff        call   7fc8 <_Znwj@plt>
9335:   89 45 e4              mov    DWORD PTR [ebp-28],eax
9338:   8b 45 e4              mov    eax,DWORD PTR [ebp-28]
933b:   89 04 24              mov    DWORD PTR [esp],eax
933e:   e8 65 ed ff ff        call   80a8 <_ZN16okCUsbFrontPanelC1Ev@plt>
9343:   8b 45 e4              mov    eax,DWORD PTR [ebp-28]
9346:   89 45 f4              mov    DWORD PTR [ebp-12],eax
9349:   8b 45 f4              mov    eax,DWORD PTR [ebp-12]
934c:   89 45 e0              mov    DWORD PTR [ebp-32],eax
934f:   eb 1f                 jmp    9370 <okUsbFrontPanel_Construct+0x5a>
9351:   89 45 dc              mov    DWORD PTR [ebp-36],eax
9354:   8b 75 dc              mov    esi,DWORD PTR [ebp-36]
9357:   8b 45 e4              mov    eax,DWORD PTR [ebp-28]
935a:   89 04 24              mov    DWORD PTR [esp],eax
935d:   e8 d6 f2 ff ff        call   8638 <_ZdlPv@plt>
9362:   89 75 dc              mov    DWORD PTR [ebp-36],esi
9365:   8b 45 dc              mov    eax,DWORD PTR [ebp-36]
9368:   89 04 24              mov    DWORD PTR [esp],eax
936b:   e8 a8 f0 ff ff        call   8418 <_Unwind_Resume@plt>
9370:   8b 45 e0              mov    eax,DWORD PTR [ebp-32]
9373:   83 c4 30              add    esp,0x30
9376:   5b                    pop    ebx
9377:   5e                    pop    esi
9378:   5d                    pop    ebp
9379:   c3                    ret

at 933e there is indeed a call to <_ZN16okCUsbFrontPanelC1Ev@plt>.Is this call that gets confused with the one inside my .cpp?

A: 

The tool of choice for diagnosing segfaults is valgrind. If you are misusing pointers or memory valgrind will find the problem and give you a stack trace well before the segfault occurs. On the FAQ, valgrind claims to handle shared libraries OK as long as you don't call dlclose().

If you have never used valgrind before I think you will be astonished at how easy and powerful it is. You just use 'valgrind' as the first word of your command line, and it finds your memory errors. Great stuff! There's a short example session on Vladislav Vyshemirsky's blog.

Norman Ramsey
I already heard about Valgrind but I'm in a tricky situation:my appication is itself a shared library.It is loaded by a program on which I have no control.Is there a way to attach valgrind to an already launched program?
nick2k3
As long as you can run the program from the command line, you can run it under valgrind. If the other program has no debugging symbols, the information you get from valgrind may not be very helpful, however---I have not tried using it in that mode.
Norman Ramsey
+1  A: 

Contrary to what Norman Ramsey says, a tool of choice for diagnosing segfaults is GDB, not valgrind.
The latter is only useful for certain kinds of segfaults (mostly these related to heap corruption; which doesn't appear to be the case here).

My crystal ball says that your dlopen() fails (you should print dlerror() if/when that happens!), and that your _okUsbFrontPanel_Construct remains NULL. In GDB you will immediately be able to tell whether that guess is correct.

My guess contradicts your statement that you "get a recursion to okUsbFrontPanel_Construct()". But just how can you know that you get such recursion, if you didn't look with GDB?

Employed Russian
dlopen does not fail..I do have a dlerror() print but I removed it from the code posted to short it a bit..
nick2k3
I stated that I "get a recursion to okUsbFrontPanel_Construct()" becouse if I put a printf in that function ie."Entering etc.." I get a lot of Entering ...Entering ...
nick2k3
+5  A: 

Now that you've posted GDB output, it's clear exactly what your problem is.

You are defining the same symbols in libokFrontPanel.so and in the libLoadLibrary.so (for lack of a better name -- it is so much easier to explain things when they are named properly), and that is causing the infinite recursion.

By default on UNIX (unlike on Windows) all global symbols from all shared libraries (and the main executable) go into single "loader symbol name space".

Among other things, this means that if you define malloc in the main executable, that malloc will be called by all shared libraries, including libc (even though libc has its own malloc definition).

So, here is what's happening: in libLoadLibrary.so you defined okCUsbFrontPanel constructor. I assert that there is also a definition of that exact symbol in libokFrontPanel.so. All calls to this constructor (by default) go to the first definition (the one that the dynamic loader first observed), even though the creators of libokFrontPanel.so did not intend for this to happen. The loop is (in the same order GDB printed them -- innermost frame on top):

 #1 okCUsbFrontPanel () at okFrontPanelDLL.cpp:169
 #3 okUsbFrontPanel_Construct () from libokFrontPanel.so
 #2 okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
 #1 okCUsbFrontPanel () at okFrontPanelDLL.cpp:169

The call to constructor from #3 was intended to go to symbol #4 -- okCUsbFrontPanel constructor inside libokFrontPanel.so. Instead it went to previously seen definition inside libLoadLibrary.so: you "preempted" symbol #4, and thus formed an infinite recursion loop.

Moral: do not define the same symbols in multiple libraries, unless you understand the rules by which the runtime loader decides which symbol references are bound to which definitions.

EDIT: To answer 'EDIT2' of the question:
Yes, the call to _ZN16okCUsbFrontPanelC1Ev from okUsbFrontPanel_Construct is going to the definition of that method inside your okFrontPanelDLL.cpp. It might be illuminating to examine objdump -d okFrontPanelDLL.o

Employed Russian
I have not defined any symbol: those name/classes are provided by the FPGA manufacturer.I cannot change names inside libokFrontPanel.so: should I change name to okUsbFrontPanel_Construct () inside okFrontPanelDLL.cppor okCUsbFrontPanel () at okFrontPanelDLL.cpp:169??
nick2k3
Yes, I understand that libokFrontPanel.so is 3rd-party code. You should change your code to avoid symbol name collision: change both okCUsbFrontPanel and okUsbFrontPanel_Construct. The latter change is not strictly necessary, but you should avoid *all* name conflicts if possible.
Employed Russian
Thank you very much,it worked!!One last thing that I cannot explain: this problem occurred while I was using the shared library (the one made by myself) inside a huge plugin-based program.I tried to make a simple main.cpp which uses my shared library(prior to change symbols)and everything worked fine..I thought that maybe this big plugin-based program uses a different way to resolve symbols than a standard program does..any guess?
nick2k3