views:

112

answers:

2

I have a daemon app written in C and is currently running with no known issues on a Solaris 10 machine. I am in the process of porting it over to Linux. I have had to make minimal changes. During testing it passes all test cases. There are no issues with its functionality. However, when I view its CPU usage when 'idle' on my Solaris machine it is using around .03% CPU. On the Virtual Machine running Red Hat Enterprise Linux 4.8 that same process uses all available CPU (usually somewhere in the 90%+ range).

My first thought was that something must be wrong with the event loop. The event loop is an infinite loop ( while(1) ) with a call to select(). The timeval is setup so that timeval.tv_sec = 0 and timeval.tv_usec = 1000. This seems reasonable enough for what the process is doing. As a test I bumped the timeval.tv_sec to 1. Even after doing that I saw the same issue.

Is there something I am missing about how select works on Linux vs. Unix? Or does it work differently with and OS running on a Virtual Machine? Or maybe there is something else I am missing entirely?

One more thing I am sure sure which version of vmware server is being used. It was just updated about a month ago though.

Thanks.

+4  A: 

I believe that Linux returns the remaining time by writing it into the time parameter of the select() call and Solaris does not. That means that a programmer who isn't aware of the POSIX spec might not reset the time parameter between calls to select.

This would result in the first call having 1000 usec timeout and all other calls using 0 usec timeout.

Zan Lynx
Brilliant! That was the issue. Thanks!
Jake
+1  A: 

As Zan Lynx said, the timeval is modified by select on linux, so you should reassign the correct value before each select call. Also I suggest to check if some of the file descriptor is in a particular state (e.g. end of file, peer connection closed...). Maybe the porting is showing some latent bug in the analisys of the returned values (FD_ISSET and so on). It happened to me too some years ago in a port of a select-driven cycle: I was using the returned value in the wrong way, and a closed fd was added to the rd_set, causing select to fail. On the old platform the wrong fd was used to have a value higher than maxfd, so it was ignored. Because of the same bug, the program didn't recognize the select failure (select() == -1) and looped forever.

Bye!

Giuseppe Guerrini
That is a very good point... I am looking into this now. Thanks!
Jon