views:

63

answers:

3

Each day an application creates a file called file_YYYYMMDD.csv where YYYYMMDD is the production date. But sometimes the generation fails and no files are generated for a couple of days.

I'd like an easy way in a bash or sh script to find the filename of the most recent file, which has been produced before a given reference date.

Typical usage: find the last generated file, disregarding those produced after the May 1st.

Thanks for your help

A: 

Try this:

#!/bin/bash

ls -r | while read fn; do
    date=`echo $fn | sed -e 's/^file_\([0-9]*\)\.csv$/\1/'` || continue
    if [ $date -lt $1 ]; then
        echo $fn
        exit
    fi
done

Just call this script with the reference date you want to compare with. Replace -lt with -le if you want to include the reference date.

Edit: An alternate solution, without piping an echoed variable. Note that I didn't test it, but it should work, too.

#!/bin/bash

ls -r | sed -e 's/^file_\([0-9]*\)\.csv$/\1/' | while read date; do
    if [ $date -lt $1 ]; then
        echo "file_${date}.csv"
        exit
    fi
done
petersohn
Neat!Is it possible to do this without the pipe of "echo $fn" to sed (i.e. put $fn directly as an argument to sed)?
caas
Since sed operates on input, you have to echo it. You could of course use a "here document" instead, but that's way more complicated. You could also pipe the output of ls through sed before reading the line. But in this case you will have to reconstruct the filename yourself when echoing it.
petersohn
Thanks, this second script is very elegant. I'll hack it a little to replace `ls` with `find` as Dennis Williamson proposes to avoid parsing ls and processing files which do not match the pattern.
caas
A: 

This script avoids:

  • Using sed repeatedly in a loop
  • Parsing ls
  • Creating a subshell in the while loop
  • Processing files that don't match the file_*.csv name pattern

Here's the script:

#!/bin/bash
while read -r file
do
    date=${file#*_}    # strip off everything up to and including the underscore
    date=${date%.*}    # strip off the dot and everything after
    if [[ $date < $1 ]]
    then
        break
    fi
done < <(find -name "file_*.csv" | sort -r)

# do something with $file, such as:
echo "$file"

Edit:

With Bash >= 3.2, you can do this using a regular expression:

#!/bin/bash
regex='file_([[:digit:]]+).csv'
while read -r file
do
    [[ $file =~ $regex ]]
    date=${BASH_REMATCH[1]}
    if [[ $date < $1 ]]
    then
        break
    fi
done < <(find -name "file_*.csv" | sort -r)

# do something with $file, such as:
echo "$file"
Dennis Williamson
Thanks, also very nice.I prefer the solution which uses sed however, because it offers more possibilities to control which files are processed. Ex: sometimes files called `file_YYYYMMDD_test.csv` appear in the directory.I'll try to mix both scripts to add the other improvements you propose.Thanks !
caas
@caas: See my edit for a version that uses Bash regular expressions and is similar to the `sed` version.
Dennis Williamson
@Dennis Williamson Neat !
caas
A: 

Sorting file names with man 1 sort will fail if there's a newline character in a file name.

Instead we should use something like:

touch $'filename\nwith\777pesky\177chars.txt'  # create a test file

ls -1db * 

find ... -print0 | LC_ALL=C sort0 ... 

see:

Find all used extensions in subdirectories,

http://codesnippets.joyent.com/posts/show/2300

frankie