views:

71

answers:

3

Is there a Java library that can take an email, compare it to a database of emails and find other emails that might be from the same "thread" of emails similar to mailing lists?

+1  A: 

Probably no libraries that I know of, but you can do this by looking at the header values in the email. There are several headers in emails that are placed in there when someone replies to messages. Here are the headers discussed.

Message-ID Every email carries with it a Message-ID header which is a globally unique string of junk. Sometimes it's a GUID, but most times it's some combination of GUID + domain. The format doesn't matter it's just some unique string.

In-Reply-To In-Reply-To holds the value of the message ID in which this email is a reply to.

References May contain a list of the Message-IDs of all the messages in the chain from the current message back to the start of the thread. If the thread is very long, this list may be abbreviated in the middle, but the first and the last message should always be present. (Older mail software uses this field to identify other messages, which the current messages refers to.)

Outlook using Thread-Index in which all emails that are apart of a single thread will carry.

You can get at these headers using good old JavaMail so it shouldn't be too hard to reconstruct threads this way. Unfortunately, there isn't a standard header like Thread-Index

http://people.dsv.su.se/~jpalme/ietf/message-threading.html

StackoverFlow post on Thread-index

http://stackoverflow.com/questions/2278314/how-does-the-email-header-field-thread-index-work

chubbard
A: 

Actually I stand corrected there might be an implementation of an algorithm you can use. Depends on what sort of API your using to read your email.

http://www.jwz.org/doc/threading.html

This describes an algorithm you could use to reconstitute the threads. Email is tricky, and lots of clients don't implement standards correctly so it becomes a pain.

chubbard
Fantastic Chubbard!! I think I read this article years ago and completely forgot about it - I remember the whinging about netscape ;)
Royce
A: 

As an aside, google searching for "threading" rather than "thread" is far more productive I've just found

Royce