I'm trying to determine the CPU utilization of specific LWPs in specific processes in Solaris 10 using data from the /proc filesystem. The problem I have is that sometimes a utilization counter decreases.
Here's the gist of it:
// we'll be reading from the file named /proc/<pid>/lwp/<lwpid>/lwpusage
std::stringstream filename;
filename << "/proc/" << pid << "/lwp/" << lwpid << "/lwpusage";
int fd = open(filename.str().c_str(), O_RDONLY);
// error checking
while(1)
{
prusage_t usage;
ssize_t readResult = pread(usage_fd, &usage, sizeof(prusage_t), 0);
// error checking
std::cout << "sec=" << usage.pr_stime.tv_sec
<< "nsec=" << usage.pr_stime.tv_nsec << std::endl;
// wait
}
close(fd);
The number of nanoseconds reported in the prusage_t struct are derived from timestamps recorded each time an LWP changes state. This feature is called microstate accounting. Sounds good, but every so often the "system call cpu time" counter decreases roughly 1-10 milliseconds.
Update: its not just the "system call cpu time" counter, I've since seen other counters decreasing as well.
Another curiosity is that it always seems to be exactly one sample that's bogus - never two near each other. All the other samples are monotonically increasing at the expected rate. This seems to rule out the possibility that the counter is somehow reset in the kernel.
Any clues as to what's going on here?
> uname -a
SunOS cdc-build-sol10u7 5.10 Generic_139556-08 i86pc i386 i86pc