ansaurus

Question

String comparison precedence in Bash

Answer 1

A:

String Comparison is an extremely complex field. There are hundreds of ways to compare strings, and few are efficient.

The simplest (for your case) would probably be:

Minimum x in STRING + (any_char):repeat(x)times

Simple, but effective in fairly typical circumstances.

See more "String Metrics" at:

http://en.wikipedia.org/wiki/String_metric

TaslemGuy 2010-09-20 21:33:15

Answer 2

+1 A:

Correct me if I'm wrong, but it appears that your script is equivalent to:

ls /path/to/directory/"$string"*

If you only want one file name out of it, you can use head. Since ls lists files alphabetically you'll get the first one in alphabetical order.

(Notice that when ls's output is piped to another program it prints one file name per line, making it easier to process than its normal column-based output.)

ls /path/to/directory/"$string"* | head -1

For the shortest match try something like the following, which uses an awkward combination of awk, sort -n, and cut to order the lines from shortest to longest and then print the first one.

ls /path/to/directory/"$string"* |
    awk '{print length($0) "\t" $0}' | sort -n | head -1 | cut -f 2-

John Kugelman 2010-09-20 21:37:29

Answer 3

A:

A lot of your echo and awk calls are superfluous. To get all the files that begin with your matching, you can simply evaluate "$string"*.

e.g. both

echo "$string"*

and

ls "$string"*

Will generate your lists. (In a pipe, echo will have them space-separated, and ls will have them newline-separated).

The next step is to realize that given this, as you have defined it, your extra constraint of "most exact match" is equivalent to the shortest matching filename.

To find the shortest string in a set of strings in bash (I'd prefer perl myself, but let's stick with the constraint of doing it in bash):

for fn in "/path/to/$string"*; do
  echo $(echo $fn | wc -c) "$fn"
done | sort -n | head -1 | cut -f2- -d' '

The for loop loops over the expanded filenames. The echo prepends the length of the names to the names. We then pipe the entire output of that into sort -n and head -1 to get the shortest name, and cut -f2- -d' ' strips the length off of it (taking the second field on with a space as the field separator).

The key with shell programming is knowing your building blocks, and how to combine them. With clever combinations of sort, head, tail, and cut you can do a lot of pretty sophisticated processing. Throw in sed and uniq and you are already able to do some quite impressive things.

That being said, I usually only use the shell for things like this "on-the-fly" -- for anything that I might want to re-use and that is at all complex I would be much more likely to use perl.

jsegal 2010-09-20 21:56:00

In fact I do not mind using perl/python/php or any other language that will get the job done. I just find bash a bit easier to use, even though I add lot of superfluous commands as you have said :)

Andrew 2010-09-20 22:03:01

ansaurus

tags:

views:

answers:

String comparison precedence in Bash

related questions