views:

16

answers:

1

I have to store e-mail messages for use with our application. I have "metadata" for all messages inside a relational database, but I don't feel comfortable keeping message content (gigabytes and terabytes of email data) inside a database. I'm currently using IMAP as a storage, but I have my doubts if I choose correctly. First of all there is a problem of uidvalidity and how to keep a permanent reference to message inside IMAP. Second, I'm not sure if this is the most robust solution in terms of backup/restore strategies, corruption of store, replication ... Positive side is that I can query IMAP using the headers because the data is mostly indexed.

I don't know if key-value stores are a better approach (Casandra, Tokyo cabinet, redis). How they handle storing 1KB and 50MB of data. How they prevent corruption and when corruption or device failure happens how can I repair the store.

A: 

What about using the filesystem? Your metadata in the database could point to the pathname of the actual message (one message per file).

If your message volume is high enough that you're worried about performance based on # of files per directory, you can split things up into a 2-(or more) level directory hierarchy (e.g., by hashing on the messageid).

Using a standard filesystem means you can use all the existing technology for journaling, replication, backups, etc.

David Gelhar