views:

600

answers:

3

I'm running into a bit of a weird error while running Perl in a chroot environment on Solaris 9 (Sparc). We are using a custom Perl, but it's almost exactly Perl 5.8.7, and this version has been running for years on various platforms including Solaris 8-10.

The following code is pretty straightforward:

#!/usr/bin/perl
use strict; 
use warnings;

print "About to sleep(1)\n";
sleep 1;
print "Just woke up!\n";

However, if I run that, "Just woke up!" never gets printed - instead, the program ends and "Alarm Clock" is echoed to the screen. This only happens if there's a sleep - if I write a program that does a lot of math and takes 10 seconds to run, everything works fine. It also only happens in a chroot environment.

I've dumped %SIG, which has an entry of 'ALRM => undef', which is expected - the non-chrooted environment has the same behaviour. However, if I change the script to include:

$SIG{ALRM} = sub {};

... everything works just fine. So, what's the deal? I don't have a lot of experience with Solaris, but there's got to be a way to make the default signal handlers behave properly.

+8  A: 

I recommend simply replacing the sleep 1 calls with select(undef, undef, undef, 1) and avoiding the whole issue.

From the symptoms you give, I'd wager that your chroot'd perl script is implementing sleep in terms of SIGALRM (as is permitted by POSIX), and that for some reason perl is not catching that signal as it should, perhaps because it isn't expecting that implementation. Is it your custom build of perl? Is it an idiosyncrasy in the chroot'd libc? Does perl -e "sleep 1" under chroot show the same problem? Etc. Etc. Hard to say without access to the environment and a tool like truss.

Again, the whole issue can be avoided: select won't muck with SIGALRM.

pilcrow
Unfortunately, I can't do that - I'm dealing with a very large code base that runs on multiple platforms. I'm not changing it all just for Solaris 9 :)The POSIX standard you mention is very likely the case. I'm pretty sure my build of perl doesn't mess with that, but it could well be the libc version of the chroot. Thanks :)
Chris Simmons
AFAICT, the `select()` will work in all platforms...
Massa
I'm not saying it won't work - I'm saying I'm not going to troll through tens of thousands of lines of code to change something that only affects one platform that the bulk of our customers don't use :)
Chris Simmons
If it's not clear, I think we're all happy you have a Perl version with a bug. ;-)
ijw
This has currently been put on the back burner, but hopefully I'll try to get an update to it within a week or so. Thanks everyone for their help so far.
Chris Simmons
+1  A: 

Do you still have the version of Perl that comes with Solaris? If so, then try your code on it. If you don't have that version, then I suggest downloading Perl 5.8.7, compiling a stock version, and then testing your script on it.

If your script runs correctly, on either of those two versions, then you know the problem is related to the changes in your version of Perl. If the script has the same error, then then I would suggest downloading Perl 5.8.9, compiling it, and then checking to see if the bug goes away. If it doesn't, then, congratulations, you have found a bug in Perl. You will probably want to run perlbug to report it.

Chas. Owens
+1  A: 

The first thing I'd try is to run your sample program under truss:

truss testprogram.pl

This will show the actual system calls used to implement the sleep. On a Solaris 8 system that I have access to, the relevant part of the output is:

write(1, " A b o u t   t o   s l e".., 18)      = 18
time()                                          = 1247258429
alarm(0)                                        = 0
sigaction(SIGALRM, 0xFFBEF6E0, 0xFFBEF790)      = 0
sigfillset(0xFF0C28D0)                          = 0
sigprocmask(SIG_BLOCK, 0xFFBEF780, 0xFFBEF770)  = 0
alarm(1)                                        = 0
    Received signal #14, SIGALRM, in sigsuspend() [caught]
sigsuspend(0xFFBEF760)                          Err#4 EINTR
setcontext(0xFFBEF448)
alarm(0)                                        = 0
sigprocmask(SIG_UNBLOCK, 0xFFBEF780, 0x00000000) = 0
sigaction(SIGALRM, 0xFFBEF6E0, 0x00000000)      = 0
time()                                          = 1247258430
Just woke up!
write(1, " J u s t   w o k e   u p".., 14)      = 14

On a Solaris 10 host, it outputs:

write(1, " A b o u t   t o   s l e".., 18)      = 18
time()                                          = 1247258270
nanosleep(0xFFBFF770, 0xFFBFF768)               = 0
time()                                          = 1247258271
Just woke up!
write(1, " J u s t   w o k e   u p".., 14)      = 14

I imagine you'll get something closer to the Solaris 8 output, and it'll probably show the sigaction() call fail for some reason.

Beyond that, I'd check that the shared libraries within the chroot /usr/lib are actually the correct versions for the host and OS version. The truss output will also show you exactly which shared libraries are being loaded by perl.

Kenster
Hrm.. I could have sworn I had `truss` working before, but it's giving me "truss: getexecname() failed" now. I'll look into this. Thanks!
Chris Simmons
Yeah, come to think of it, it might be hard to get truss working inside the chroot environment. You can run truss from outside the environment, either by using the -p option to truss a PID, or else by trussing chroot: "truss chroot /newroot command"
Kenster
Accepting this answer, as running "truss" and investigating why it failed led me to find what I think was the real problem - the OS was a weird hybrid of Solaris 9 with Solaris 8 libraries. Re-installing the OS properly fixed these problems.
Chris Simmons