views:

7831

answers:

9

How do I connect to Gmail and determine which messages have attachments? I then want to download each attachment, printing out the Subject: and From: for each message as I process it.

I never found a nice solution to this problem so I'm going to start a bounty. The person who provides the best answer wins. Please provide the complete code. I will test it against my Gmail account. Answers in Perl, Python, and Java are acceptable.

+7  A: 

I'm not an expert on Perl, but what I do know is that GMail supports IMAP and POP3, 2 protocols that are completely standard and allow you to do just that.

Maybe that helps you to get started.

Jeroen Landheer
IMAP I would say being the more reliable of the two for backup purposes.
Kris Kumler
+3  A: 

Within gmail, you can filter on "has:attachment", use it to identify the messages you should be getting when testing. Note this appears to give both messages with attached files (paperclip icon shown), as well as inline attached images (no paperclip shown).

There is no Gmail API, so IMAP or POP are your only real options. The JavaMail API may be of some assistance as well as this very terse article on downloading attachments from IMAP using Perl. Some previous questions here on SO may also help.

This PHP example may help too. Unfortunately from what I can see, there is no attachment information contained within the imap_header, so downloading the body is required to be able to see the X-Attachment-Id field. (someone please prove me wrong).

Kevin Haines
A: 

Since Gmail supports the standard protocols POP and IMAP, any platform, tool, application, component, or API that provides the client side of either protocol should work.

I suggest doing a Google search for your favorite language/platform (e.g., "python"), plus "pop", plus "imap", plus perhaps "open source", plus perhaps "download" or "review", and see what you get for options.

There are numerous free applications and components, pick a few that seem worthy, check for reviews, then download and enjoy.

Rob Williams
A: 

You should be aware of the fact that you need SSL to connect to GMail (both for POP3 and IMAP - this is of course true also for their SMTP-servers apart from port 25 but that's another story).

moster67
A: 

Have you taken a look at the GMail 3rd party add-ons at wikipedia?

In particular, PhpGmailDrive is an open source add-on that you may be able to use as-is, or perhaps study for inspiration?

toolkit
+4  A: 
#!/usr/bin/env python
"""Save all attachments for given gmail account."""
import os, sys
from libgmail import GmailAccount

ga = GmailAccount("[email protected]", "pA$$w0Rd_")
ga.login()

# folders: inbox, starred, all, drafts, sent, spam
for thread in ga.getMessagesByFolder('all', allPages=True):
    for msg in thread:
        sys.stdout.write('.')
        if msg.attachments:
           print "\n", msg.id, msg.number, msg.subject, msg.sender
           for att in msg.attachments:
               if att.filename and att.content:
                  attdir = os.path.join(thread.id, msg.id)
                  if not os.path.isdir(attdir):
                     os.makedirs(attdir)                
                  with open(os.path.join(attdir, att.filename), 'wb') as f:
                       f.write(att.content)

untested

  1. Make sure TOS allows such scripts otherwise you account will be suspended
  2. There might be better options: GMail offline mode, Thunderbird + ExtractExtensions, GmailFS, Gmail Drive, etc.
J.F. Sebastian
http://libgmail.cvs.sourceforge.net/viewvc/libgmail/libgmail/libgmail.py?view=markup
J.F. Sebastian
A: 

For Java, you will find G4J of use. It's a set of APIs to communicate with Google Mail via Java (the screenshot on the homepage is a demonstration email client built around this)

Brian Agnew
+35  A: 

Hard one :-)

import email, getpass, imaplib, os

detach_dir = '.' # directory where to save attachments (default: current)
user = raw_input("Enter your GMail username:")
pwd = getpass.getpass("Enter your password: ")

# connecting to the gmail imap server
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail") # here you a can choose a mail box like INBOX instead
# use m.list() to get all the mailboxes

resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id

for emailid in items:
    resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
    email_body = data[0][1] # getting the mail content
    mail = email.message_from_string(email_body) # parsing the mail content to get a mail object

    #Check if any attachments at all
    if mail.get_content_maintype() != 'multipart':
        continue

    print "["+mail["From"]+"] :" + mail["Subject"]

    # we use walk to create a generator so we can iterate on the parts and forget about the recursive headach
    for part in mail.walk():
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue

        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()
        counter = 1

        # if there is no filename, we create one with a counter to avoid duplicates
        if not filename:
            filename = 'part-%03d%s' % (counter, 'bin')
            counter += 1

        att_path = os.path.join(detach_dir, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()

Wowww ! That's was something ;-) But try the same in Java, just for fun !

By the way, I tested that in a shell, so there must remains some errors.

Enjoy

e-satis
I've fixed the code. Feel free to rollback.
J.F. Sebastian
THat's awesome, guys! Well done! :D :D
Ivan Vučica
Thanks J.F. Wrote that wuick and dirty, you gave it value :-D
e-satis
This is a good answer. It dies with a malloc error on large attachments. Python(57780) malloc: *** mmap(size=9658368)
@melling: 9658368 looks suspiciously close to 11M limit (taking into account base64 encoding that inflates size) on gmail attachments size.
J.F. Sebastian
Where in the script does it dies ? I don't get this error, but we may find a workaround.
e-satis
imaplib.py lib: (*** error: can't allocate region) error resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/imaplib.py", line 437, in fetch typ, dat = self._simple_command(name, message_set, message_parts)
+3  A: 

Take a look at Mail::Webmail::Gmail:

GETTING ATTACHMENTS

There are two ways to get an attachment:

1 -> By sending a reference to a specific attachment returned by get_indv_email

# Creates an array of references to every attachment in your account
my $messages = $gmail->get_messages();
my @attachments;

foreach ( @{ $messages } ) {
    my $email = $gmail->get_indv_email( msg => $_ );
    if ( defined( $email->{ $_->{ 'id' } }->{ 'attachments' } ) ) {
        foreach ( @{ $email->{ $_->{ 'id' } }->{ 'attachments' } } ) {
            push( @attachments, $gmail->get_attachment( attachment => $_ ) );
            if ( $gmail->error() ) {
                print $gmail->error_msg();
            }
        }
    }
}

2 -> Or by sending the attachment ID and message ID

#retrieve specific attachment
my $msgid = 'F000000000';
my $attachid = '0.1';
my $attach_ref = $gmail->get_attachment( attid => $attachid, msgid => $msgid );

( Returns a reference to a scalar that holds the data from the attachment. )

JDrago