views:

74

answers:

1

I'm modifying the Coldfusion-based interface for a listserv admin application to show snippits of recently-posted messages on a page. The messages are all stored in a SQL Server 2005 database on the listserv's mail server, and in theory it should be easy enough to query the recent ones and display them. However, the "message" column of the table that contains the e-mail record seems to contain all of the "souce code" of the e-mail, exactly as sent to the mail server. It contains control codes, e-mail headers and markup. For example, part of the message data returned in a query might look something like this:

This is a multi-part message in MIME format.  
------_=_NextPart_001_01CA9A9E.B2224293 Content-Type: text/plain;  
charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable  
All: =20 The correct time for Tuesday's call is 3 pm ET as noted on the agenda 

(line breaks added for readability, this was actually all on one line).

When I display the message on the site, I just want it to look like this:

All:
The correct time for Tuesday's call is 3 pm ET as noted on the agenda

There's actually a lot more complicated encodings than the example I've given. Some messages include base-64 encoded attachments, and similar things. How can I strip away all the e-mail code and markup and just display the text of the message?

I imagine somebody must have written some public code or some custom tag that does this, but my Google-fu has failed me thus far. Thanks.

+5  A: 

You can use JavaMail to do this. If what you have in the DB is the full email, then you should be able to parse it into its component bits using JavaMail.

Not sure what your Java level is like, but this will return you all the elements you need, most likely. It gets a bit complicated handling complex messages, and requires a bit of recursion.

myEmailString is your column from the DB containing the entire raw email.

The result is a struct with properties:

.toRecipients = array of email addresses
.ccRecipients = array of email addresses
.from = array of email addresses
.subject = string
.sentDate = date object
.receivedDate = date object
.attachments = array of {.mimeType: string, .fileName: string}
.bodyParts = {.html: array of strings, .text: array of strings}

The code (should work under CF8+ Java 1.6):

<cfscript>
    myStream = createObject("java","java.io.ByteArrayInputStream").init(myEmailstring.getBytes());
    // Create a java mimemessage and feed it our source inputSteam
    // Set up a fake MailSession so we can ingest emails from a file
    local.props = createObject("java", "java.util.Properties").init();
    local.props.put( javacast("string", "mail.host"), javacast("string", "smtp.somedomain.com"));
    local.props.put( javacast("string", "mail.transport.protocol"),  javacast("string", "smtp"));
    local.mailSession = createObject("java", "javax.mail.Session").getDefaultInstance(local.props,javacast("null", ""));

    local.message = createObject("java", "javax.mail.internet.MimeMessage").init(local.mailSession, myStream);

    local.recipientObj = createObject("java", "javax.mail.Message$RecipientType");

    // Set up our data structure to hold all the elements of the mail object.
    local.mailStruct = structNew();
    local.mailStruct.subject = "";
    local.mailStruct.from = "";
    local.mailStruct.toRecipients = "";
    local.mailStruct.ccRecipients = "";
    local.mailStruct.receivedDate = "";
    local.mailStruct.sentDate = "";
    local.mailStruct.attachments = arrayNew(1);
    local.mailStruct.bodyParts = structNew();
    local.mailStruct.bodyParts.html = arrayNew(1);
    local.mailStruct.bodyParts.text = arrayNew(1);


    // Handle all the header stuff.  Mostly just to: and from: at this point.
    local.mailStruct.subject = fixNull(local.message.getSubject());
    local.mailStruct.from = parseAddress(fixNull(local.message.getFrom()));
    local.mailStruct.toRecipients = parseAddress(fixNull(local.message.getRecipients(local.recipientObj.TO)));
    local.mailStruct.ccRecipients = parseAddress(fixNull(local.message.getRecipients(local.recipientObj.CC)));
    local.mailStruct.receivedDate = fixNull(local.message.getReceivedDate());
    local.mailStruct.sentDate = fixNull(local.message.getSentDate());

    // Handle the body stuff.
    parseEmailBody(local.message,local.mailStruct,#getTempDirectory()#);
</cfscript>
<cfdump var="#local.mailStruct#">

<cffunction name="parseEmailBody" output="false">
    <cfargument name="messagePart" required="true" />
    <cfargument name="mailStruct" required="true" />
    <cfargument name="attachDir" required="true" />
    <cfset var local=structNew()>
    <cfscript>
        if (arguments.messagePart.isMimeType("text/plain")) {
            // Text Body Part
            arrayAppend(arguments.mailStruct.bodyParts.text,arguments.messagePart.getContent());
        } else if (arguments.messagePart.isMimeType("text/html")) {
            // HTML Body Part
            arrayAppend(arguments.mailStruct.bodyParts.html,arguments.messagePart.getContent());
        } else {
            // this is a multipart email part.
            local.mp = arguments.messagePart.getContent();
            for(local.i=0; local.i < local.mp.getCount(); local.i++) {
                try {
                    local.part = local.mp.getBodyPart(javacast("int",local.i));
                    local.disp = local.part.getDisposition();
                    if (  isDefined("local.disp") && (UCase(local.disp)=="ATTACHMENT" || UCase(local.disp)=="INLINE") ) {
                        /* This is an attachment.  Handle accordingly */
                    } else {
                        /* This part is not a binary attachment - could be another multipart bit,
                            or could be a single part.  Either way, we need to run it through again
                            to see what it is and handle it properly.
                        */
                        parseEmailBody(local.part,arguments.mailStruct,arguments.attachDir);
                    }
                } catch (Any e) {
                    // Some error happened trying to parse part of the message.
                }
            }
        }
    </cfscript>
</cffunction>

<cffunction name="parseAddress" output="false">
    <cfargument name="addressObj" required="true" />
    <cfset var local=structNew()>
    <cfscript>
        local.addressArray = ArrayNew(1);
        if (NOT arguments.addressObj is "") {
            for (local.i=1; local.i lte arrayLen(arguments.addressObj); local.i++) {
                local.addressArray[local.i] = arguments.addressObj[local.i].getAddress();
            }
        }
        return local.addressArray;
    </cfscript>
</cffunction>

<cffunction name="fixNull" access="private" output="false">
    <cfargument name="valueToFix" default="" />
    <cfset rStr = "" />
    <cfif isDefined("arguments.valueToFix")>
        <cfset rStr = arguments.valueToFix />
    </cfif>
    <cfreturn rStr />
</cffunction>
Edward M Smith
@Edward - Thanks for your very helpful reply. I appreciate you taking the time to type all that out. I never thought of using the Java mail routines. Unfortunately, I can't get your code to work quite right. After correcting some minor issues and setting it up in my application, whenever I give it a message to parse it always returns the input unchanged, exactly as it was sent. I even tried feeding it sample MIME messages I found, such as the one at http://msdn.microsoft.com/en-us/library/ms526560%28EXCHG.10%29.aspx , it still doesn't parse them. I wonder what's wrong?
Joshua Carmody
Joshua Carmody
@Edward - Ok, I've gotten it to work with the sample MIME message from MSDN. Turns out there was some extra white space in the data I was feeding it that it didn't like. Haven't figured out how to make it work with my data yet, but it looks promising.
Joshua Carmody
@Edward - Got it to work with my data. AWESOME. You're a lifesaver.
Joshua Carmody
In CF8+, you can use <,>, etc in script, but not in tags: http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=CFScript_03.html
Edward M Smith
@Joshua - yes, the parser's sensitive to pre-whitespace. I tend to trim() things before feeding them in.
Edward M Smith
Joshua Carmody
Hmm, maybe its just 8.1, and not 8.0. Strange! Anyway, I'm pleased that worked out for you!
Edward M Smith