tags:

views:

482

answers:

5

How to remove files inside a directory that have more or less lines than specified (all files have ".txt" suffix)?

+1  A: 

Try this bash script:

LINES=10
for f in *.txt; do 
  if [ `cat "$f" | wc -l` -ne $LINES ]; then 
     rm -f "$f"
  fi
done

(Not tested)

EDIT: Use a pipe to feed in wc, as wc prints the filename as well.

0x6adb015
Doesn't work here: "line 3: [: too many arguments"
schnaader
I also tried to do this: a=`wc -l "$f`; if [ "$a" -ne $LINES ]; This would work, but wc -l does output the count and the filename...
schnaader
+1, as this was the prototype to my answer :)
schnaader
Arg! do [ `cat "$f" | wc -l` -ne $LINES ];
0x6adb015
+3  A: 

Played a bit with the answer from 0x6adb015. This works for me:

LINES=10
for f in *.txt; do
  a=`cat "$f" | wc -l`;
  if [ "$a" -ne "$LINES" ]
  then
    rm -f "$f"
  fi
done
schnaader
Switched to "cat" the file, too.
schnaader
+6  A: 

This bash script should do the trick. Save as "rmlc.sh".

Sample usage:

rmlc.sh -more 20 *.txt   # Remove all .txt files with more than 20 lines
rmlc.sh -less 15 *       # Remove all files with fewer than 20 lines

Note that if the rmlc.sh script is in the current directory, it is protected against deletion.


#!/bin/sh

# rmlc.sh - Remove by line count

SCRIPTNAME="rmlc.sh"
IFS=""

# Parse arguments 
if [ $# -lt 3 ]; then
    echo "Usage:"
    echo "$SCRIPTNAME [-more|-less] [numlines] file1 file2..."
    exit 
fi

if [ $1 == "-more" ]; then
    COMPARE="-gt" 
elif [ $1 == "-less" ]; then
    COMPARE="-lt" 
else
    echo "First argument must be -more or -less"
    exit 
fi

LINECOUNT=$2

# Discard non-filename arguments
shift 2

for filename in $*; do
    # Make sure we're dealing with a regular file first
    if [ ! -f "$filename" ]; then
        echo "Ignoring $filename"
        continue
    fi

    # We probably don't want to delete ourselves if script is in current dir
    if [ "$filename" == "$SCRIPTNAME" ]; then
        continue
    fi

    # Feed wc with stdin so that output doesn't include filename
    lines=`cat "$filename" | wc -l`

    # Check criteria and delete
    if [ $lines $COMPARE $LINECOUNT ]; then
        echo "Deleting $filename"
        rm "$filename"
    fi 
done
Kevin Ivarsen
+1 - Very good, complete and well documented script
schnaader
My only issue with this is the "gratuitous use of cat". wc -l can operate on a file all by itself: wc -l "$filename" is all you need.
Harper Shelby
Harper: I originally tried "wc -l" by itself. The problem is that the output includes the filename rather than just the line number. For example, "wc -l rmlc.sh" outputs "48 rmlc.sh", while "echo rmlc.sh | wc -l" simply outputs "48".
Kevin Ivarsen
this will fail on filenames containing spaces, and iirc on large directories. See my "find" based comment for one way around that.
simon
Kevin's script works great, so does Simon's solution. No flaws, even though I deal with more than 4 000 files. If I could I would accept both :)Thank you all for your answers, I greatly appreciate your help!
Daniel
`wc -l < "$filename"`, kill the cat
Hasturkun
A: 

My command line mashing is pretty rusty, but I think something like this will work safely (change the "10" to whatever number of lines in the grep) even if your filenames have spaces in them. Adjust as needed. You'd need to tweak it if newlines in filenames are possible.

find . -name \*.txt -type f -exec wc -l {} \; | grep -v "^10 .*$" | cut --complement -f 1 -d " " | tr '\012' '\000' | xargs -0 rm -f
simon
Thank you Simon, both your command line and Kevin's script work perfectly, even though I have more than 4 000 files :)
Daniel
+1  A: 

This one liner should also do

 find -name '*.txt' | xargs  wc -l | awk '{if($1 > 1000 && index($2, "txt")>0 ) print $2}' | xargs rm

In the example above, files greater than 1000 lines are deleted.

Choose > and < and the number of lines accordingly.

Sathya
Use xargs -0 if filenames can contain space.
Sathya