views:

38

answers:

2

I get a abort() which I am not able to 'catch' in my code. Perhaps I am missing some understanding; can you give me some insight or perhaps help me with the abort() ?

Please note; the code works fine for thousands of users I have; but a very few (two now) have reported this crash.

First the code (simplified):

244: -(void)openBSDSocket:(NSString*)hostname useSSL:(bool)useSSL {
245:     // Look up host 
246:     if ( (remoteHost = gethostbyname([hostName cStringUsingEncoding:NSUTF8StringEncoding])) == NULL ) {
247:        [NSException raise:SOCKET_EX_HOST_NOT_FOUND format:SOCKET_EX_HOST_NOT_FOUND_F, strerror(errno)];
248: }

Caused this crash (dump)

Thread 34 Crashed:
0   libSystem.B.dylib               0x00007fff8550fb6e __semwait_signal_nocancel + 10
1   libSystem.B.dylib               0x00007fff8550fa70 nanosleep$NOCANCEL + 129
2   libSystem.B.dylib               0x00007fff8556c3c6 usleep$NOCANCEL + 57
3   libSystem.B.dylib               0x00007fff8558b97c abort + 93
4   libSystem.B.dylib               0x00007fff854a3615 free + 128
5   libSystem.B.dylib               0x00007fff854f409b _mdns_search + 1469
6   libSystem.B.dylib               0x00007fff854f8564 _mdns_hostbyname + 287
7   libSystem.B.dylib               0x00007fff854f826d search_host_byname + 139
8   libSystem.B.dylib               0x00007fff854f8186 gethostbyname + 98
9   com.NZBVortex.NZBVortex         0x0000000100021346 -[CFNetworkStream openBSDSocket::] + 246

All the openBSDSocket's are correctly wrapped with exception catched, which of course doesn't catch the abort()

Can you help me give some insight here?

A: 

The man page says it's thread safe, but still recommends to use getaddrinfo (man page) in a threaded environment...

Yuji
Thanks for the alternative. But if it is à heap problem suggested above it Will not fix the problem.
Ger Teunis
+2  A: 

Your heap is getting corrupted. gethostbyname() is calling free() to free some memory it allocated. free() has some internal consistency checks in it: if it detects that the heap has been corrupted, then it calls abort() to terminate the program -- once your heap is corrupted you pretty much can't recover from it, so the best thing to do is to fail as soon as you've detected that.

Unfortunately, figuring out exactly where your heap is getting corrupted is not easy. There are some Malloc Debug Environment Variables you can set to help track this down.

Adam Rosenfield
Thanks for the info. It is happening for à extremely low number of users but akways at Same place, shouldn't be heap problems of my app cause more random crashes? Isn't libsystem in it's own address space? Hard to tackle i guess.
Ger Teunis
@Ger Teunis: Are you making proper deep copies of the returned `hostent`? The docs say "The functions gethostbyname() and gethostbyaddr() may return pointers to static data, which may be overwritten by later calls. Copying the struct hostent does not suffice, since it contains pointers; a deep copy is required. "
Adam Rosenfield
Thanks again, I make a bcopy of hostent->h_addr into sockaddr_in->sin_addr.s_addr and don't use the hostent after that anymore. I've switched to getaddrinfo now and will send the customer a new test version. Although I think he would not even be able to reproduce the original bug (it's that sporadic)
Ger Teunis