tags:

views:

208

answers:

4

Hi all, I have a bash script that I am run to check to see if one of my programs has hung, and if it has kill it. The script works fine if ran from the command line, but if I schedule it with cron it does something very strange.

Basically the script (below) gets the PID of my program and gets its created date/time from its entry in the /proc/ directory. It then gets the current date/time from the system and converts these two values into seconds since 1970 with the "date" command, before finally subtracting the two. This usually ends up with a total of 2100 seconds or something like that, which equates to 35 minutes.

#!/bin/bash
THEDATE=$(date +%s)
MYPID=$(ps aux|grep -v grep|egrep "MyProgram.exe"|awk '{print $2}')
if (( ${#MYPID} > 0 )); then
    STARTTIME=$(ls -ld /proc/$MYPID|date +%s -d"$(awk '{print $6, $7}')")
    TOTALMINS=$(( ($THEDATE - $STARTTIME) / 60  ))
    if (( $TOTALMINS >= 30 )); then
     kill -9 $MYPID
     logger -t "[KillLongRunningProcesses] Killed my program which had been running for $TOTALMINS minutes"
    fi
fi

When ran from the command line, the two date variables (THEDATE and STARTTIME) both get the correct values. But when run by cron the STARTTIME is wrong. It has the correct date, but seems to ignore the time part and sets it to midnight, ie "2009-12-14 00:00:00" is obtained instead of "2009-12-14 13:23:00", which throws off all the calculations.

Any ideas? Thanks.

A: 

For what it's worth I used to get caught out with the PATH and other environment variables not being set automatically.

Worth checking. If it's not that then comment and I'll delete my answer.

Tom Duckering
I've tried to steer clear of environmental variables, and I only use absolute paths. I don't think the script relies on anything like that. Thanks anyway.
Andrew Jones
+2  A: 

That's why you can't rely on parsing ls. You should use stat if your system has it.

stat --printf=%Y /proc/$MYPID

If not, perhaps your find can do it for you:

find /proc -maxdepth 1 -name $MYPID -printf "%T@"
Dennis Williamson
A: 

I've figured out how to do it. If I use the "--time-style=long-iso" argument with ls it returns it in the format I need. Apparently cron has different preferences for some command than the default user.

I take your point Dennis that "ls" is unreliable for parsing, but the configuration for the computer I'm using is never going to change, so now that it works I will leave it as is. In future I will likely do things your way.

Andrew Jones
*Never* parse the output of 'ls'. It's only meant for human readability, not for parsing. Please read this --> http://mywiki.wooledge.org/ParsingLs
SiegeX
+1  A: 

First off never parse the output of ls, read THIS to understand why. Next, your script can be greatly improved by using pgrep rather than using awk to parse the PID from a grep on 'ps aux'. Also, your script breaks horribly in the case where you have more than one PID returned. And finally, when writing shell scripts try not to use CAPITALS for variable names; that convention is reserved for variables that you export into your environment.

The following script attempts to solve the problems mentioned above. It is as efficient as I could make it and it will handle the case where you have multiple PIDs. It also checks to make sure that the PID still exists before we kill it because it's possible that when we kill the parent it may take out the child processes.

#!/bin/bash

prog_name="MyProgram.exe"
the_date=$(date +%s)
my_pids=( $(pgrep "$prog_name") )

for ((i=0; i < ${#my_pids[@]}; i++)); do
    if [[ -d /proc/${my_pids[i]} ]]; then
        start_time=$(stat --printf=%Y /proc/${my_pids[i]})
        total_mins=$(( (the_date - start_time) / 60 ))
        if (( $total_mins >= 30 )); then
            kill -9 ${my_pids[i]}
            logger -t "Your custom message here"
        fi
    fi
done
SiegeX