ansaurus

Question

Linux: delete files that don't contain specific number of lines

Answer 1

+1 A:

Try this bash script:

LINES=10
for f in *.txt; do 
  if [ `cat "$f" | wc -l` -ne $LINES ]; then 
     rm -f "$f"
  fi
done

(Not tested)

EDIT: Use a pipe to feed in wc, as wc prints the filename as well.

0x6adb015 2009-06-01 15:15:44

Doesn't work here: "line 3: [: too many arguments"

schnaader 2009-06-01 15:24:18

I also tried to do this: a=`wc -l "$f`; if [ "$a" -ne $LINES ]; This would work, but wc -l does output the count and the filename...

schnaader 2009-06-01 15:28:58

+1, as this was the prototype to my answer :)

schnaader 2009-06-01 15:35:44

Arg! do [ `cat "$f" | wc -l` -ne $LINES ];

0x6adb015 2009-06-01 15:41:11

Answer 2

+3 A:

Played a bit with the answer from 0x6adb015. This works for me:

LINES=10
for f in *.txt; do
  a=`cat "$f" | wc -l`;
  if [ "$a" -ne "$LINES" ]
  then
    rm -f "$f"
  fi
done

schnaader 2009-06-01 15:33:55

Switched to "cat" the file, too.

schnaader 2009-06-01 15:45:41

Answer 3

+6 A:

This bash script should do the trick. Save as "rmlc.sh".

Sample usage:

rmlc.sh -more 20 *.txt   # Remove all .txt files with more than 20 lines
rmlc.sh -less 15 *       # Remove all files with fewer than 20 lines

Note that if the rmlc.sh script is in the current directory, it is protected against deletion.

#!/bin/sh

# rmlc.sh - Remove by line count

SCRIPTNAME="rmlc.sh"
IFS=""

# Parse arguments 
if [ $# -lt 3 ]; then
    echo "Usage:"
    echo "$SCRIPTNAME [-more|-less] [numlines] file1 file2..."
    exit 
fi

if [ $1 == "-more" ]; then
    COMPARE="-gt" 
elif [ $1 == "-less" ]; then
    COMPARE="-lt" 
else
    echo "First argument must be -more or -less"
    exit 
fi

LINECOUNT=$2

# Discard non-filename arguments
shift 2

for filename in $*; do
    # Make sure we're dealing with a regular file first
    if [ ! -f "$filename" ]; then
        echo "Ignoring $filename"
        continue
    fi

    # We probably don't want to delete ourselves if script is in current dir
    if [ "$filename" == "$SCRIPTNAME" ]; then
        continue
    fi

    # Feed wc with stdin so that output doesn't include filename
    lines=`cat "$filename" | wc -l`

    # Check criteria and delete
    if [ $lines $COMPARE $LINECOUNT ]; then
        echo "Deleting $filename"
        rm "$filename"
    fi 
done

Kevin Ivarsen 2009-06-01 15:40:20

+1 - Very good, complete and well documented script

schnaader 2009-06-01 15:43:47

My only issue with this is the "gratuitous use of cat". wc -l can operate on a file all by itself: wc -l "$filename" is all you need.

Harper Shelby 2009-06-01 15:48:33

Harper: I originally tried "wc -l" by itself. The problem is that the output includes the filename rather than just the line number. For example, "wc -l rmlc.sh" outputs "48 rmlc.sh", while "echo rmlc.sh | wc -l" simply outputs "48".

Kevin Ivarsen 2009-06-01 15:52:23

this will fail on filenames containing spaces, and iirc on large directories. See my "find" based comment for one way around that.

simon 2009-06-01 15:59:22

Kevin's script works great, so does Simon's solution. No flaws, even though I deal with more than 4 000 files. If I could I would accept both :)Thank you all for your answers, I greatly appreciate your help!

Daniel 2009-06-01 16:44:15

`wc -l < "$filename"`, kill the cat

Hasturkun 2009-06-03 16:49:37

Answer 4

A:

My command line mashing is pretty rusty, but I think something like this will work safely (change the "10" to whatever number of lines in the grep) even if your filenames have spaces in them. Adjust as needed. You'd need to tweak it if newlines in filenames are possible.

find . -name \*.txt -type f -exec wc -l {} \; | grep -v "^10 .*$" | cut --complement -f 1 -d " " | tr '\012' '\000' | xargs -0 rm -f

simon 2009-06-01 15:55:12

Thank you Simon, both your command line and Kevin's script work perfectly, even though I have more than 4 000 files :)

Daniel 2009-06-01 16:38:06

Answer 5

+1 A:

This one liner should also do

 find -name '*.txt' | xargs  wc -l | awk '{if($1 > 1000 && index($2, "txt")>0 ) print $2}' | xargs rm

In the example above, files greater than 1000 lines are deleted.

Choose > and < and the number of lines accordingly.

Sathya 2009-06-01 15:59:18

Use xargs -0 if filenames can contain space.

Sathya 2009-06-01 15:59:56

ansaurus

tags:

views:

answers:

Linux: delete files that don't contain specific number of lines

related questions