views:

119

answers:

2

How could I export my e-mail database from Gmail (or Thunderbird) into R?

Like there is the rgoogledocs package and twitteR, is there a gmailR package, or a standard format for exporting emails into stat packages ?

Tal

+2  A: 

Standard email (on a Unix system) is either an mbox file (containing several messages) or a maildir setup where each mail is a file in a directory.

Either way, it's ascii text. That is how a MUA (mail-user agents -- your mail reader) is orthogonal to your MTA (mail-transport agent -- mail server software like exim, qmail, postfix, ...). The MTA may use a network protocol like POP3 or IMAP to serve the mail files to the client in which case the client (which may be Gmail or Thunderbird) no longer sees the underlying files. So you may need to learn how to export your mail from whichever backend you employ and then read it.

This has nothing to do with R or programming so far --- unless you now feel you must extend R with POP3 or IMAP facilities to connect to a (remote) mail server.

Dirk Eddelbuettel
+2  A: 

Gmail and Thunderbird are not the same... you can enable Gmail account in Thunderbird, hence export each email in ASCII file, hence write a R batch script that will take each file and import it in R as an object, hence... you get the point. =)

Usually I'm trying to avoid "the pedestrian approach"... but I'm getting an impression that you're prone on using R as a "general purpose" programming language... Python or JAVA, on the other hand can be quite efficient, so you can write (or ask someone to write it for you) a script that will "bring" you data in desirable format, and then crunch it in R. R has matured a lot, and it's not solely a tool for statistical analysis any more, but it's always a good idea to use some widely-known programming language to carry out your data.

So there... Roll up your sleeves, and dive into Python (JAVA, C... whatever you feel like diving in)!

P.S. I reckon that this has something to do with your previous post with word cloud...

aL3xa