views:

488

answers:

3

I have multiple files in a folder and each of them have one email message. Each message has a header in the format

Subject: formatting fonts
To: [email protected]
From: sender name

message body

I want to get all the unique sender names from all the messages (there is only 1 message per file) . How can I do that?

+2  A: 

Assuming there can't be random headers in the middle of the messages, then this should do the trick:

cat * | grep '^From: ' | sort -u

If there may be other misleading "From:" lines in the middle of the messages, then you just need to make sure you are only getting the first matching line from each message, like so:

for f in * ; do cat $f | grep '^From: ' | head -1 | sort -u ; done

Obviously you can replace the * in either command with a different glob or list of file names.

John
I'd add ` | sort | uniq` after all that
kch
You are correct... I missed the 'unique' part of the original question. I've updated my answer to add '|sort -u'. ('|sort|uniq' would work as well).
John
A: 

Do you want to filter out sender names or e-mail addresses? Usually you have both in "From" lines, such as

From: Lessie <[email protected]>

The you can use sed to remove the e-mail address part

sed 's/^From: //;s/ *<[^>]*> *//'

ending up with something like this:

ls | while read filename
do
    grep '^From: ' $filename | head -n1 | sed 's/^From: //;s/ *<[^>]*> *//;s/^"//;s/"$//'
done | sort -u
che
A: 

To tighten up some of the answers. (I don't have enough reputation yet to comment.) The following should be sufficient:

grep -m 1 '^From: ' * | sed -'s/^From: *//' | sort -u

Will give you a list of unique from addresses for all the messages in the directory. If you want to clean up the address portion you can add more to the sed command like che's answer. There is no need to need to 'cat * | grep'.

jabbie