views:

171

answers:

5

For instance, if the original message (message 1) is...

Hey Jon,
Want to go get some pizza?
-Bill

And the reply (message 2) is...

Bill,
Sorry, I can't make lunch today.
Jonathon Parks, CTO Acme Systems

On Wed, Feb 24, 2010 at 4:43 PM, Bill Waters wrote:

> Hey John,
> Want to go get some pizza?
> -Bill

In Gmail, the system (a) detects that message 2 is a reply to message 1 and turns this into a 'thread' of sorts and (b) detects where the replied portion of the message actually is and hides it from the user. (In this case the hidden portion would start at "On Wed, Feb..." and continue to the end of the message.)

Obviously, in this simple example it would be easy to detect the "On <Date>, <Name> wrote:" or the ">" character prefixes. But many email systems have many different style of marking replies (not to mention HTML emails). I get the feeling that you would have to have some damn smart string parsing algorithms to get anywhere near how good GMail's is.

Does this technology already exist in an open source project somewhere? Either in some library devoted to this exclusively or perhaps in some open source email client that does similar message threading?

Thanks.

A: 

I believe Gmail works by subject title. I can't check it at the moment, but a quick change to the title might break the threading.

The following is difficult to predict, as you mention:

On Wed, Feb 24, 2010 at 4:43 PM, Bill Waters wrote:

but grabbing the email title Pizza tomorrow and assuming a prefix of Re: Pizza tomorrow is considerably more predictable. You could also assume the cases of FW: and RE: (in caps).

Neil McKeown
A: 

Do you mean to solve problems where the correspondent doesn't set In-Reply-To: or References: header fields?

Otherwise, you might use mutt and configure it to not show quotes by default.

(Should be done by any other mail-tool on earth too. (Well, i never got a tree-thread-view in Outlook.)

[edited below in reaction to comment]

If you try to build your own software, then this question obviously is suited well. But then, I can only give you my 2c on this. If you cannot rely on the explicit headers, than the only thing to do is take a bunch of mails, learn the most common phrases used to indicate quotes. (Luckily there are some conventions, and date formats and names/emails are not completely arbitrary.)

If you do this for analysis of communication threads, you probably want to indicate the likelyness of the relation. If you only do it for convenience of the user... well,... my personal opinion? Don't sweat about people not able to use a decent mailtool.

Don Johe
I'm developing a software tool which (among other things) is going to need to take a bunch of raw email messages and using whatever information possible build a 'tree' or 'thread' structure out of the messages.I'm only just learning about In-Reply-To: and References: headers in RFC822 emails. It looks like I'll be using that data, but they might be missing in some cases so I'm looking perhaps for some heuristic-driven approach to determining "what is a response to what".
Chris W.
A: 

What kind of Mail Delivery Agent are you using?

Are you developing your own? In that case, are you planning to implement IMAP protocol?

If you're using Cyrus (or any other product that handles IMAP) with SORT and THREAD extensions, then it's already built in.

In both cases, you should take a look at RFC 5256.

Brian Clozel
A: 

There's a good article written by Zawinski here:

http://www.jwz.org/doc/threading.html

Daniel
A: 

You could have a look at sup http://freshmeat.net/articles/sup-gmail-meets-the-console as it does almost what you want

Bricololo