views:

261

answers:

5

I'm currently working on a system that allows users to reply to notification emails that are sent out (sigh).

I need to strip out the replies and signatures, so that I'm left with the actual content of the reply, without all the noise.

Does anyone have any suggestions about the best way to do this?

+1  A: 

I don't believe you can do this reliably (signatures used to begin with '--' but I don't see that anymore). Perhaps you're better off asking people to reply inbetween text headers and then simply strip the reply from this ? It's not elegant, but perhaps more reliable.

e.g.

REPLY BETWEEN HERE -->

AND HERE -->

so you'd simply look for the required headers above and take what's inbetween.

Brian Agnew
Or even just:REPLY ABOVE THIS LINE ------------------------------------
andyjeffries
Indeed. Whatever is the most usable and least ambiguous.
Brian Agnew
A: 

The recommended signature delimiter is "-- \n". If people follow this recommendation, stripping signatures should be easy.

I don't see people using that any more, more's the pity.
Brian Agnew
Big IF... Recommended is certainly not standard in this case.
Tom Juergens
A: 

If you can assume that these emails are in plain text, just strip lines that begins with ">" as replies, and "-- " line should delimit signature. But those assumptions might not work, as not all people over internet use software that complies to rules.

samuil
This is the problem. I don't believe you can automate this reliably.
Brian Agnew
+4  A: 

If you system is in-house and/or you have a limited number of reply formats, it's possible to do a pretty good job. Here are the filters we have set up for email responses to trac tickets:

Drop all text after and including:

  1. Lines that equal '-- \n' (standard email sig delimiter)
  2. Lines that equal '--\n' (people often forget the space in sig delimiter; and this is not that common outside sigs)
  3. Lines that begin with '-----Original Message-----' (MS Outlook default)
  4. Lines that begin with '________________________________' (32 underscores, Outlook agian)
  5. Lines that begin with 'On ' and end with ' wrote:\n' (OS X Mail.app default)
  6. Lines that begin with 'From: ' (failsafe four Outlook and some other reply formats)
  7. Lines that begin with 'Sent from my iPhone'
  8. Lines that begin with 'Sent from my BlackBerry'

Numbers 3 and 4 are 'begin with' instead of 'equals' because sometimes users will squash lines together on accident.

We try to be more liberal about stripping out replies, since it's much more of an annoyance (to us) have reply garbage than it is to correct missing text.

Anybody have other formats from the wild that they want to share?

onecreativenerd
A: 

You should also check http://pushreply.com. It collects email replies, extracts the relevant content and notifies your application over HTTP.

Disclaimer: I'm the creator of this service. Let me know what you think.

Andrei Savu