tags:

views:

571

answers:

5

I run the code gives me the following sample data

md5deep find * | awk '{ print $1 }'

A sample of the output

    /Users/math/Documents/Articles/Number theory: Is a directory
    258fe6853b1bfb2d07f512ff6bec52b1
    /Users/math/Documents/Articles/Probability and statistics: Is a directory
    4811bfb2ad04b9f4318049c01ebb52ef
    8aae4ac3694658cf90005dbdea37b4d5
    258fe6853b1bfb2d07f512ff6bec52b1

I have tried to filter the rows which contain Is a directory by SED unsuccessfully

md5deep find * | awk '{ print $1 }' | sed s/\/*//g

Its sample output is

/Users/math/Documents/Articles/Number theory: Is a directory
/Users/math/Documents/Articles/Topology: Is a directory
/Users/math/Documents/Articles/useful: Is a directory

How can I filter Out each row which contains "Is a directory" by SED/AWK?

[clarification] I want to filter out the rows which contain Is a directory.

+1  A: 

Why not use grep instead?

ie,

md5deep find * | grep "Is a directory" | awk '{ print $1 }'

Edit: I just re-read your question and if you want to remove the lines with Is a directory, use the -v flag of grep, ie:

md5deep find * | grep -v "Is a directory" | awk '{ print $1 }'
Josh W.
The last command seems to work. The first problem was that the stout and sterr were in the same output. I needed to put the command 1> hashes.txt to the end of your command. It seems that md5deep does not work as expected - I get only hashes of files in the current directory, not in subdirectories.
Masi
no need to use grep. >>> md5deep find * | awk '/Is a directory/{ print $1 }'
A: 

I'm not intimately familiar with md5deep, but this may do something like you are tying to do.

find -type f -exec md5sum {} +
Kent Fredric
+1  A: 

The "Is a directory" bit is being sent on stderr.

So, to get the data you'll need to send standard error output to a file, and send the md5sum bits to a file.

md5deep find * > /dev/null 2> directories.txt

Like Josh W., I assume from your question you want the "Is a directory" lines.

ashawley
md5deep find * > test gives me just the files at the current directory - it does not give files in subdirectories. Perhaps, md5sum does not work as we expect it.
Masi
+1  A: 

I have not used the md5deep tool, but I believe those lines are error messages; they would be going to standard error instead of standard out, and so they are going directly to your terminal instead of through the pipe. Thus, they won't be filtered by your sed command. You could filter them by merging your standard error and standard output streams, but

It looks like (I'm not sure because you are missing the backquotes) you are trying to call

md5deep `find *`

and find is returning all of the files and directories.

Some notes on what you might want to do:

  • It looks like md5deep has a -r for "recursive" option. So, you may want to try:

    md5deep -r *
    

    instead of the find command.

  • If you do wish to use a find command, you can limit it to only files using -type f, instead of files and directories. Also, you don't need to pass * into a find command (which may confuse find if there are files that have names that looks like the options that find understands); passing in . will search recursively through the current directory.

    find . -type f
    
  • In sed if you wish to use slashes in your pattern, it can be a pain to quote them correctly with \. You can instead choose a different character to delimit your regular expression; sed will use the first character after the s command as a delimiter. Your pattern is also lacking a .; in regular expressions, to indicate one instance of any character you use ., and to indicate "zero or more of the preceding expression" you use *, so .* indicates "zero or more of any character" (this is different from glob patterns, in which * alone means "zero or more of any character").

    sed "s|/.*||g"
    
  • If you really do want to be including your standard error stream in your standard output, so it will pass through the pipe, then you can run:

    md5deep `find *` 2>&1 | awk ...
    
  • If you just want to ignore stderr, you can redirect that to /dev/null, which is a special file that just discards anything that goes into it:

    md5deep `find *` 2>/dev/null | awk ...
    

In summary, I think the command below will help you with your immediate problem, and the other suggestions listed above may help you if I did not undersand what you were looking for:

md5deep -r * | awk '{ print $1 }'
Brian Campbell
The last command keeps runnning - it never ends. It seems md5deep -r does not work.
Masi
Sorry, it looks like md5deep needs filename arguments; so md5deep -r * should work. If not given arguments, it is expecting standard input.
Brian Campbell
Your last command works! --- My mistake with the command md5deep `find *` was that I did not use the smart one of '. The command puts the sterr and stout to the same output. Is there any way to ignore sterr, other than saving stout and sterr to separate files?
Masi
You can ignore stderr by using 2>/dev/null, for example "md5deep `find *` 2>/dev/null | awk ..." in your original command.
Brian Campbell
@Brian: I am surprised: Your first command's running time is 0.759s while your last command needs minutes to complete (still running).
Masi
There are several examples I gave that were there for instructional purposes, not because you should do things that way. Using find in backquotes is almost never something you want to do. I hope you weren't taking all of my commands literally; "awk ..." is a shorthand, not a literal command to run.
Brian Campbell
@Brian: Thank you for your answers!
Masi
+1  A: 

To specifically answer the clarification: how to filter out lines using awk and sed:

awk '/Is a directory/ {next} {print}'
sed 'g/Is a directory/d'
glenn jackman
The first command works. However, the last one does not. I tried the command also without the quotes, since I have a feeling that you do not need those quotes in Mac/Linux.
Masi
Ah, right. I was thinking of the ex 'g/re/d' command. For sed, you want: sed '/Is a dir/d' (without the 'g')And yes, you do need the quotes to pass the sed script as a single argument to the sed command.
glenn jackman