views:

91

answers:

2

I have a service which process emails in a mailbox and once processed, stores some information from the email in the database. At the minute the schema looks something like:

  • ID
  • Sender
  • Subject
  • Body (result of being parsed/stripped to plain text)
  • DateReceived

I am building a web front-end for the database and the main purpose of storing the emails is to provide the facility for users to look back and see what they have sent. However, another reason is for auditing purposes on my end.

The emails at the moment are being moved to specific mailbox folders. So what I plan to start doing is once the email is processed, record it in the database and delete the email from the mailbox instead of just moving it.

So a couple of questions...

1) Is it a good idea to delete the actual email from exchange? Is it better to hold onto it just in case?
2) To keep the size of the fields down I was stripping the HTML out of the emails, is this a bad idea? should I just store the email as it is received?

Any other advice/suggestions would be great.

+2  A: 

In both cases I think you should hold onto the original emails. Storage is cheap, but if disk space is really an issue look to compression rather than excision to solve it.

Both your of your use cases (historical record and audit) will be better served by storing the complete unabridged email in the database. Once you start tampering with the data, albeit "just" removing formatting, it becomes difficult to prove that you haven't edited it in other, more significant ways. Especially if you have deleted the original email instead of archiving it.

You don't say what business you're in, but the other thing to remember is whether there are any data retention policies active within your organisation or in the wider jurisdiction. Compliance is becoming gnarlier all the time.

APC
Thanks for the advice on the DRP! Never actually knew about that one. So you reckon maintaining the emails in exchange aswell as the DB would be my best bet? Yeah I could smell something was wrong with editing the actual original email and storing it. I will definitely change this.
James
@APC: +1 for maintaining the original message.
Alfred Myers
+1  A: 

I would maintain the messages on the Mailbox on a specific folder as you are doing and probably wouldn't even save anything on a database given you can access the Mailbox from within your application.

The Exchange team over the years has developed several APIs for accessing the Mailbox's contents.

With Exchange Server 2007 and 2010, the recommended API would be Exchange Web Services which can be used from any language/environment that is capable of accessing Web Services.

If you are developing with a .Net language (C#, VB.NET for instance), your best bet would be EWS Managed API.

If you are really going to do something meaningful with the body, you can save the results as named properties (extended properties in EWS parlance) on the message itself.

There are other APIs with corresponding functionality for previous versions of Exchange.

Alfred Myers
Interesting, never thought about that sort of approach. Just retrieving the emails on demand and displaying them. Would you say this is just as reliable as storing it in the db?
James
Yes. As can be seen at http://blogs.msdn.com/exchangedev/archive/2008/05/22/exchange-developer-roadmap.aspx, they've removed several of the previous APIs because of architectural changes they've made to improve reliability.
Alfred Myers
The problem being I am restricted to exchange 2003, so I think in my case my safest bet would be to store the emails in the db.
James