views:

53

answers:

2

Hi there, I'm trying to write a bash script that looks at a directory full of files and categorises them as either plaintext or binary. A file is plaintext if it ONLY contains plaintext characters, otherwise it is binary. So far I have tried the following permutations of grep:

#!/bin/bash
FILES=`ls`
for i in $FILES
do
    ########GREP SYNTAX###########
    if grep -qv -e[:cntrl:] $i
    ########/GREP SYNTAX##########
    then
        mv $i $i-plaintext.txt
    else
        mv $i $i-binary.txt
    fi
done

In the grep syntax line, I have also tried the same without the -v flag and swapping the branches of the if statements, as well as both combinations of the same with [:alnum:] and [:print:]. All six of these variations produce some files labelled binary wich consist solely of plantext and some files labelled plaintext which contain at least one non-printable character.

I need to find a way to identify files that only contain printable characters i.e. A-Z, a-z, 0-9, punctuation, spaces and new lines. All files containing any character that is not in this set shoudl be classified as binary.

I've been bashing my head against a wall trying to sort this for half a day. Help! Thanks in advance, Rik

A: 

You can use the -I option of grep which will treat binary files as files without a match and just use a regex that will always match (like the empty string):

if grep -qI -e '' $i
Bart Sas
Thanks for the response,That will only match files that are entirely
RikSaunderson
That will oly match files which are entirely binary, rather than files which are a mixture of binary and printable characters.
RikSaunderson
+3  A: 

First you can/should do

for f in *

instead of putting the output of ls in a variable. The chief reason for doing this is to be able to handle filenames that include spaces.

Second, you need to enclose the character class in a set of brackets or it's going to look at those characters as literals. And I would enclose them in a set of single quotes to protect against the shell interpreting them. Don't use -v and negate the print class and see if that works for you.

if grep -aq -e '[^[:print:]]' "$f"

And as shown in that line, always quote variables when they contain filenames.

mv "$f" "$f-plaintext.txt"

To keep grep from complaining about binary files, use -a.

The variable i is often used for an integer or an index. Use f or file.

Finally:

#!/bin/bash
for f in *
do
    if grep -aq -e '[^[:print:]]' "$f"
    then
        mv "$f" "$f-binary.txt"
    else
        mv "$f" "$f-plaintext.txt"
    fi
done
Dennis Williamson