I'm trying to extract email addresses from plain text transcripts of emails. I've cobbled together a bit of code to find the addresses themselves, but I don't know how to make it discriminate between them; right now it just spits out all email addresses in the file. I'd like to make it so it only spits out addresses that are preceeded by "From:" and a few wildcard characters, and ending with ">" (because the emails are set up as From [name]<[email]>).
Here's the code now:
import re #allows program to use regular expressions
foundemail = []
#this is an empty list
mailsrch = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
#do not currently know exact meaning of this expression but assuming
#it means something like "[stuff]@[stuff][stuff1-4 letters]"
# "line" is a variable is set to a single line read from the file
# ("text.txt"):
for line in open("text.txt"):
foundemail.extend(mailsrch.findall(line))
# this extends the previously named list via the "mailsrch" variable
#which was named before
print foundemail