views:

77

answers:

3

I have a set of, oh 8000 or so files, that I need to de-dupe. The files are essentially lists of numbers delimited by returns:

nnnn
nnnnn
nnnn

and I would like to sort and de-dupe the numbers within the files themselves. I can do this manually using sort | uniq or sort -u but I effectively want to overwrite the files. Is there a way to do this without using a temp file? And what syntax should I be using to avoid the 'ambiguous redirect' error ! :-)

#!/usr/bin/env bash
cd /Users/dd/Desktop/images
TEMP="/tmp/$(basename $0).$RANDOM.txt"
for FILENAME in "`find . -name *version_ids.txt -print`"
do
  cat $FILENAME | sort -u > $TEMP
  $TEMP > $FILENAME
done

(I tried the following, which gave no error, but didn't seem to have the desired effect...

#!/usr/bin/env bash
cd /Users/dd/Desktop/images
for FILENAME in "`find . -name *version_ids.txt -print`"
do
  sort -u $FILENAME -o $FILENAME
done

)

+1  A: 

Try

#!/usr/bin/env bash
cd /Users/dd/Desktop/images
for FILENAME in $(find . -name *version_ids.txt -print)
do
  sort -u "$FILENAME" > "$FILENAME.tmp"
  mv "$FILENAME" "$FILENAME.bak" && mv "$FILENAME.tmp" "$FILENAME"
done

Note that this script is still not safe from problematic filenames (those with spaces or newlines in them).

Aaron Digulla
Thanks - I'll drop the .bak bit but I can see it's good practice. (I have an alternate backup already...)
Dycey
A: 

You can't do $TEMP > $FILENAME

#!/usr/bin/env bash
cd /Users/dd/Desktop/images
TEMP="/tmp/$(basename $0).$RANDOM.txt"
for FILENAME in $(find . -name *version_ids.txt -print)
do
  <"$FILENAME" sort -u >"$TEMP"
  cat "$TEMP" >"$FILENAME"
done
Douglas Leeder
+1  A: 

GNU sort is able to edit a file in place:

sort -u -o $FILENAME $FILENAME
mouviciel