If you are able to associate a reply (RE:) message with the original/previous message that it is a reply to, then I would think that you could grab the body text of the original/previous message from your database, and then remove that text from the body of the reply. However, this method will not be 100% accurate, because clients could convert an HTML/Rich Text email in to plain text, or vice-versa. In any such case, this method probably wouldn't work. Even so, this technique would be generic and probably work the majority of the time.
In addition, the email provider may add certain header fields, or preambles, to the beginnings of a quoted message in a reply. In this case, I don't think there is any "catch all" solution.
My recommendation would be to target a few of the really huge web-mail providers (Gmail, Yahoo, Microsoft, etc), learn the formats that they use for their replies and parse the messages accordingly. In addition, you could likely handle a few generic formats as well. For instance, the '>' character is commonly used at the beginning of each line of quoted text in a reply.
If you're going to be developing in a language like C#, create yourself an Interface like IReplyFormat
, with a corresponding implementation for each provider, and possibly some generic formats.
I don't think you will find any catch-all/perfect solution to this problem, as there are simply too many mail providers with too many different formats. However, I think you can at the very least find some techniques, like the ones mentioned above, that will work for you more times than not, which is the best you can hope for at this point.