views:

43

answers:

3

The following example will compare all files in a directory to input string ($string) and return matching filename. It is not very elegant and efficient way of accomplishing that. For speed purposes I modified for condition to only compare to files that start start with first word of $string.

Problem with this script is following - I have two files in the directory:

Foo Bar.txt
Foo Bar Foo.txt

and I compare them to string "Foo Bar 09.20.2010". This will return both files in that directory, as both files match. But I need to return only the file that matches the string in most exact way - in our example it should be Foo Bar.txt.

Also if you have better ideas how to solve this problem please post your ideas as I am not that proficient in scripting yet and I am sure there are better and maybe even easier ways of doing this.

#!/bin/bash
string="Foo Bar 09.20.2010"

for file in /path/to/directory/$(echo "$string" | awk '{print $1}')*; do

    filename="${file##*/}"
    filename="${filename%.*}"


    if [[ $(echo "$string" | grep -i "^$filename") ]]; then
        result="$file"
        echo $result    
    fi

done

Here is breakdown what I want to achieve. Two files in directory to match against two strings, Correct/Incorrect in brackets means if result was as I expected/wanted or not.

2 Files In directory (stripped off extensions for matching):

Foo Bar.txt
Foo Bar Foo.txt

To compare against 2 Strings:

Foo Bar Random Additional Text
Foo Bar Foo Random Additional Text

Results:

compare "Foo Bar"(.txt) against Foo Bar Random Additional Text -> Match (Correct)
compare "Foo Bar"(.txt) against Foo Bar Foo Random Additional Text -> Match (Incorrect)

compare "Foo Bar Foo"(.txt) against Foo Bar Random Additional Text -> NOT Match (Correct)
compare "Foo Bar Foo"(.txt) against Foo Bar Foo Random Additional Text -> Match (Correct)

Thank you everyone for your answers.

A: 

String Comparison is an extremely complex field. There are hundreds of ways to compare strings, and few are efficient.

The simplest (for your case) would probably be:

Minimum x in STRING + (any_char):repeat(x)times

Simple, but effective in fairly typical circumstances.

See more "String Metrics" at:

http://en.wikipedia.org/wiki/String_metric

TaslemGuy
+1  A: 

Correct me if I'm wrong, but it appears that your script is equivalent to:

ls /path/to/directory/"$string"*

If you only want one file name out of it, you can use head. Since ls lists files alphabetically you'll get the first one in alphabetical order.

(Notice that when ls's output is piped to another program it prints one file name per line, making it easier to process than its normal column-based output.)

ls /path/to/directory/"$string"* | head -1

For the shortest match try something like the following, which uses an awkward combination of awk, sort -n, and cut to order the lines from shortest to longest and then print the first one.

ls /path/to/directory/"$string"* |
    awk '{print length($0) "\t" $0}' | sort -n | head -1 | cut -f 2-
John Kugelman
A: 

A lot of your echo and awk calls are superfluous. To get all the files that begin with your matching, you can simply evaluate "$string"*.

e.g. both

echo "$string"*

and

ls "$string"*

Will generate your lists. (In a pipe, echo will have them space-separated, and ls will have them newline-separated).

The next step is to realize that given this, as you have defined it, your extra constraint of "most exact match" is equivalent to the shortest matching filename.

To find the shortest string in a set of strings in bash (I'd prefer perl myself, but let's stick with the constraint of doing it in bash):

for fn in "/path/to/$string"*; do
  echo $(echo $fn | wc -c) "$fn"
done | sort -n | head -1 | cut -f2- -d' '

The for loop loops over the expanded filenames. The echo prepends the length of the names to the names. We then pipe the entire output of that into sort -n and head -1 to get the shortest name, and cut -f2- -d' ' strips the length off of it (taking the second field on with a space as the field separator).

The key with shell programming is knowing your building blocks, and how to combine them. With clever combinations of sort, head, tail, and cut you can do a lot of pretty sophisticated processing. Throw in sed and uniq and you are already able to do some quite impressive things.

That being said, I usually only use the shell for things like this "on-the-fly" -- for anything that I might want to re-use and that is at all complex I would be much more likely to use perl.

jsegal
In fact I do not mind using perl/python/php or any other language that will get the job done. I just find bash a bit easier to use, even though I add lot of superfluous commands as you have said :)
Andrew