views:

37

answers:

3

I have a directory with files

heat1.conf
heat2.conf
...
heat<n>.conf
minimize.conf
...
other files....

I want my Bash script to be able to grab the highest number filename (so I can delete and replace it when I find an error condition).

What's the best way to accomplish this?

Please discuss the speed of your solution and why you think that it is the best approach.

+1  A: 

What about:

max=$(find . -name 'heat[1-9]*.conf' -depth 1 |
      sed 's/heat\([0-9][0-9]*\)\.conf/\1/' |
      sort -n |
      tail -n 1)

List the possible file names; keep just the non-numeric bit; sort the numbers; select the largest (last) number.


Regarding speed: without falling into a scripting language like Perl (Python, Ruby, ...), this is close to as good as you can get. The use of find instead of ls means that the list of file name is generated just once (the first version of this answer used ls, but that causes the shell to generate the list of file names, and then ls to echo that list). The sed command is fairly simple, and generates a list of numbers which have to be sorted. You could argue that a sort in reverse numeric order (sort -nr) piped into sed 1q would be faster; the second sed would read less data, and the sort might not generate all its output before the SIGPIPE from sed closing its input (as it terminates).

In a scripting language like Perl, you would avoid multiple processes, and the overhead of pipe communication between those processes. This would be faster, but there'd be a lot less shell scripting involved.

Jonathan Leffler
A: 

I came up with one solution:

highest=-1
current_dir=`pwd`
cd $my_dir
for file in $(ls heat*) ; do #assume I've already checked for dir existence
    if [ "${file:4:$(($(expr length $file)-9))}" -gt "$highest" ]; then
    highest=${file:4:$(($(expr length $file)-9))}
    fi
done
cd $current_dir

....Okay I took your suggestions and edited my solution to scrap the expr and pre-save the variable. In 1000 trials, my method (modified) on average was faster that Jon's but slower than GhostDog's, but the standard deviation was relatively large.

My revised script is seen below in my trial, as are Jon and Ghost Dog's solutions...

declare -a timing

for trial in {1..1000}; do
    res1=$(date +%s.%N)
    highest=-1
    current_dir=`pwd`

    cd $my_dir
    for file in $(ls heat*) ; do 
        #assume I've already checked for dir existence
    file_no=${file:4:${#file}-9}
    if [ $file_no -gt $highest ]; then
        highest=$file_no
    fi
    done
    res2=$(date +%s.%N)
    timing[$trial]=$(echo "scale=9; $res2 - $res1"|bc)
    cd $current_dir
done

average=0
#compile net result
for trial in {1..1000}; do
    current_entry=${timing[$trial]}
    average=$( echo "scale=9; (($average+$current_entry/1000.0))"|bc)
done

std_dev=0
for trial in {1..1000}; do
    current_entry=${timing[$trial]}
    std_dev=$(echo "scale=9; (($std_dev + ($current_entry-$average)*($current_entry-$average)))"|bc)
done
std_dev=$(echo "scale=9; sqrt (($std_dev/1000))"|bc)
printf "Approach 1 (Jason), AVG Elapsed Time:    %.9F\n"  $average
printf "STD Deviation:                   %.9F\n"  $std_dev


for trial in {1..1000}; do
    res1=$(date +%s.%N)
    highest=-1
    current_dir=`pwd`

    cd $my_dir
    max=$(ls heat[1-9]*.conf |
    sed 's/heat\([0-9][0-9]*\)\.conf/\1/' |
    sort -n |
    tail -n 1)
    res2=$(date +%s.%N)
    timing[$trial]=$(echo "scale=9; $res2 - $res1"|bc)
    cd $current_dir
done

average=0
#compile net result
for trial in {1..1000}; do
    current_entry=${timing[$trial]}
    average=$( echo "scale=9; (($average+$current_entry/1000.0))"|bc)
done

std_dev=0
for trial in {1..1000}; do
    current_entry=${timing[$trial]}
    #echo "(($std_dev + ($current_entry-$average)*($current_entry-$average))"
    std_dev=$(echo "scale=9; (($std_dev + ($current_entry-$average)*($current_entry-$average)))"|bc)
done
std_dev=$(echo "scale=9; sqrt (($std_dev/1000))"|bc)
printf "Approach 2 (Jon), AVG Elapsed Time:    %.9F\n"  $average
printf "STD Deviation:                   %.9F\n"  $std_dev


for trial in {1..1000}; do
    res1=$(date +%s.%N)
    highest=-1
    current_dir=`pwd`

    cd $my_dir
    for file in heat*.conf
      do
      num=${file:4}
      num=${file%.conf}
      [[ $num -gt $max ]] && max=$num    
    done
    res2=$(date +%s.%N)
    timing[$trial]=$(echo "scale=9; $res2 - $res1"|bc)
    cd $current_dir
done

average=0
#compile net result
for trial in {1..1000}; do
    current_entry=${timing[$trial]}
    average=$( echo "scale=9; (($average+$current_entry/1000.0))"|bc)
done

std_dev=0
for trial in {1..1000}; do
    current_entry=${timing[$trial]}
    #echo "(($std_dev + ($current_entry-$average)*($current_entry-$average))"
    std_dev=$(echo "scale=9; (($std_dev + ($current_entry-$average)*($current_entry-$average)))"|bc)
done
std_dev=$(echo "scale=9; sqrt (($std_dev/1000))"|bc)
printf "Approach 3 (GhostDog), AVG Elapsed Time:    %.9F\n"  $average
printf "STD Deviation:                   %.9F\n"  $std_dev

... the results are:

Approach 1 (Jason), AVG Elapsed Time:    0.041418086
STD Deviation:                   0.177111854
Approach 2 (Jon), AVG Elapsed Time:    0.061025972
STD Deviation:                   0.212572411
Approach 3 (GhostDog), AVG Elapsed Time:    0.026292145
STD Deviation:                   0.145542801

Good job GhostDog!!! And thanks to both you Jon and the commenters for your tips! :)

Jason R. Mick
One obvious improvement is to run `expr` once, not twice, per new number. `expr` is not a fast program. Also, in a shell script in its own file, the `pwd` and final `cd` commands are irrelevant; this is not a DOS .bat file you're dealing with. Even as a fragment of a bigger script, I'd probably use a sub-shell, allowing it to change to the target directory while leaving the calling shell exactly where it always was. You could avoid the `ls` which simply echoes the list of files that the shell has generated when it expanded the wild cards; you could make the wild cards more accurate.
Jonathan Leffler
Since `sh` doesn't have `${var:start:count}` substring selection, your script is evidently in Bash. Therefore there's no need to use `expr`. Also, arithmetic is enabled by default inside the substring operator. Numeric comparisons should be done inside `(())`. `if (( ${file:4:${#file}-9} > highest ))`
Dennis Williamson
+3  A: 

If you are going to list your file only in current directory, then there's no need to use find with maxdepth 1 or use ls. Just use a for loop with shell expansion. Also, expr is external. if your number doesn't contain decimals, you can use just bash's own comparison.

max=-1
for file in heat*.conf
do
  num=${file:4}
  num=${file%.conf}
  [[ $num -gt $max ]] && max=$num    
done
echo "max is: $max"
ghostdog74