If you system is in-house and/or you have a limited number of reply formats, it's possible to do a pretty good job. Here are the filters we have set up for email responses to trac tickets:
Drop all text after and including:
- Lines that equal
'-- \n'
(standard email sig delimiter)
- Lines that equal
'--\n'
(people often forget the space in sig delimiter; and this is not that common outside sigs)
- Lines that begin with
'-----Original Message-----'
(MS Outlook default)
- Lines that begin with
'________________________________
' (32 underscores, Outlook agian)
- Lines that begin with
'On '
and end with ' wrote:\n'
(OS X Mail.app default)
- Lines that begin with
'From: '
(failsafe four Outlook and some other reply formats)
- Lines that begin with
'Sent from my iPhone'
- Lines that begin with
'Sent from my BlackBerry'
Numbers 3 and 4 are 'begin with' instead of 'equals' because sometimes users will squash lines together on accident.
We try to be more liberal about stripping out replies, since it's much more of an annoyance (to us) have reply garbage than it is to correct missing text.
Anybody have other formats from the wild that they want to share?