I am looking for a way to identify quoted text in emails. The goal is to add something along the lines of Gmails "show quoted text" feature to my web app which involves a mail handler bot.
There are similar questions on stackoverflow, but they are asking for an algorithm. I could implement this if I have to, but I would greatly prefer a tried and true solution.
Requirements:
1) Support both HTML and plain text emails
2) Operates on the full thread (that is, it has the original text to compare the quoted text against; no need to guess)
3) Handles common quote-related additions such as "On May 10th, 2008 at 6:35 PM Brandon wrote:"
A python library would be super magically awesome ideal, but I don't expect to get that lucky. A simple command line tool which can do this would pretty close to ideal, but I don't expect to that that lucky either. I'd gladly settle on a well known good implementation from an open source mail client which would be reasonably possible to extract into a tool.
Does anyone have a suggestion what my best bet would be?
I'm kind of surprised that there is no such thing as an "email handler bot construction kit".