views:

237

answers:

6

Hi script-writers,

The day came when I had to write a BASH script that walks arbitrary directory trees and looks at arbitrary files and attempts to determine something regarding a comparison among them. I thought it would be a simple couple-of-hours_tops!_ process - Not So!

My hangup is that sometimes some idiot -ahem!- excuse me, _lovely_user_ chooses to put spaces in directory and file names. This causes my script to fail.

The perfect solution, aside from threatening the guillotine for those who insist on using spaces in such places (not to mention the guys who put this in operating systems' code!), might be a routine that "escapes" the file and directory names for us, kind of like how cygwin has routines to convert from unix to dos filename formats. Is there anything like this in a standard Unix / Linux distribution?

Note that the simple for file in * construct doesn't work so well when one is trying to compare directory trees as it ONLY works on "the current directory" - and, in this case as in many others, constantly CDing to various directory locations brings with it its own problems. So, in doing my homework, I found this question http://stackoverflow.com/questions/1563856/handle-special-characters-in-bash-for-in-loop and the proposed solution there hangs up on spaces in directory names, but can simply be overcome like this:

dir="dirname with spaces"
ls -1 "$dir" | while read x; do
   echo $x
done

PLEASE NOTE: The above code isn't particularly wonderful because the variables used inside the while loop are INACCESSIBLE outside that while loop. This is because there's an implied subshell created when the ls command's output is piped. This is a key motivating factor to my query!

...OK, the code above helps for many situations but "escaping" the characters would be pretty powerful too. For example, dir above might contain:

dir\ with\ spaces

Does this already exist and I've just been overlooking it?

If not, does anyone have an easy proposal to write one - maybe with sed or lex? (I'm far from competent with either.)

A: 

The find command sometimes works in this situation:

find . -exec ls {} \;

for example

ennuikiller
+1  A: 

I found this How to escape file names in bash shell scripts while googling that I'm quoting below:

After fighting with Bash for quite some time, I found out that the following code provides a nice basis for escaping special characters. Of cource it is not complete, but the most important characters are filtered.

If anybody has a better solution, please let me know. It works and it is readable but not pretty.

FILE_ESCAPED=`echo "$FILE" | \
sed s/\\ /\\\\\\\\\\\\\\ /g | \
sed s/\\'/\\\\\\\\\\\\\\'/g | \
sed s/\&/\\\\\\\\\\\\\\&/g | \
sed s/\;/\\\\\\\\\\\\\\;/g | \
sed s/\(/\\\\\\\\\\(/g | \
sed s/\)/\\\\\\\\\\)/g `

Maybe you could use it as starting point.

Pascal Thivent
Thanks for this code snippet. It's an incomplete version of what I was asking for, THANK YOU!
Richard T
+1  A: 
Dennis Williamson
It's not mine but, yes, it's a monster :)
Pascal Thivent
Great post, thanks. ...This is _very_ good and responsive to the original question.
Richard T
+1  A: 

The following snippet handles all filenames (those including blanks, quotes, newlines, ...):

startdir="${1:-.}"                              # first parameter or working directory

#-------------------------------------------------------------------------------
#  IFS is undefined
#  read:
#  -r  do not allow backslashes to escape any characters
#  -d  delimiter is \0  (not a valid character in a filename)
#  done < <( find ... ) . redirection from a process substitution
#-------------------------------------------------------------------------------
while IFS=  read -r -d '' file; do
  echo "'$file'"
done < <( find "$startdir" -type f -print0 )

See also this BashFAQ.

fgm
Thanks for the post. OK, this is another way to loop and is neither better nor worse than the loop posted in the original question. It has the disadvantage of resetting IFS and if you need it set inside the loop, you'll have a headache. And it has the advantage of letting the script-writer carry varrible contents out of the loop - a limitation of the code presented in the original query.
Richard T
+1  A: 

There's a pretty serious problem with the escaping approach: what escapes are needed depends on the context the variable's going to be expanded in, and in the usual case there's no escaping that'll work. For instance, if you're going to do something simple like:

touch a "b c" d
files="a b\ c d"
ls $files

...it won't work (ls looks for 4 files: "a", "b\", "c", and "d") because the shell doesn't pay any attention to escapes when it word-splits $files. You could use eval ls $files, but that would fail on things like tabs in the filenames.

The while ... read ... done < <(find ... -print0) approach fgm suggested works solidly (and because of the flexibility of find's search patterns, is very powerful), but it's also a rather messy pile of workarounds for various possible problems; if you don't need find's power, it's not hard to get things done with for and *:

shopt -s nullglob    # In case of empty directories...
for filepath in "$dir"/*; do    # loop over all files in the specified directory
    filename="${filepath##*/}"    # You just wanted the files' names?  No problem.
    echo "$filename"
done

If (as you mention in the question) you're interested in comparing the two directory trees, looping through one of them isn't quite what you want; it'd be better to put their contents into arrays, like this:

shopt -s nullglob
pathlist1=("$dir1"/*)    # Get a list of paths of files in dir1
filelist1=("${pathlist1[@]##*/}")    # Parse off just the filenames
pathlist2=("$dir2"/*)    # Same for dir2
filelist2=("${pathlist2[@]##*/}")
# now compare filelist1 with filelist2...

(Note that AFAIK the "${pathlist2[@]##*/}" construct is not standard, but seems to have been supported in both bash and zsh for a while now.)

Gordon Davisson
Very thoughtful post and creative, thanks. One point here is that with your escape pattern troubles comment, one could overcome the problems you speak of by using quotes in addition to the "escaping" - at least I think so. ...My system doesn't seen to know what "shopt" is - I presume it's a shell option. My bash doesn't like it! And I'm afraid I don't quite get what the "${pathlist2[@]##*/}" business is even trying to do! More here, perhaps?
Richard T
On quoting in addition to escaping: I tried that, the quotes just get treated as part of the filename; aside from `eval`, I don't think there's a way to do it. On `shopt`: what version of bash are you using? It's in every version I've used... If you don't have it, and there are no matching files, the glob pattern expands to itself. An alternative is that you can add `[[ -e "$filepath" ]] || continue` as the first line of the `for` loop.
Gordon Davisson
On `"${pathlist2[@]##*/}"`: `"${pathlist2[@]}"` expands to the members of the array, each as a separate "word". Adding `##*/` removes through the last "/" in each entry -- basically, it's a trick for turning an array of full file paths into an array of just the filenames.
Gordon Davisson
Thanks, Gordon, I obviously haven't spent enough time learning shell!
Richard T
A: 
#!/bin/bash

while read filename; do
  echo 'I am doing something with "'"$filename"'".'
done < <(find)

Do note that the <( ) notation won't work when bash is invoked as /bin/sh.

Ignacio Vazquez-Abrams