tags:

views:

101

answers:

5

I'm working on creating a messaging system as a pet project, that will include the ability to have file attachments. This will be used on a website of mine for the internal messaging system.

One of the features of this system is I want to maintain the MD5 checksum of every file that is uploaded so if duplicate files are uploaded, the two links will reference the same file.

I've come up with the following so far:

Message
----------
MessageID (PK)  
SenderID (FK)  
RecipientsID (FK)  
AttachmentsID (FK)  
Subject  
MessageText  
DateSent  

Recipient  
----------  
UserID (FK)  
MessageID (FK)  

Attachment  
----------  
ID
Name
MessageID (FK)
FileID (FK)

File  
----------
ID
Checksum
LastAccessDate
AccessCount

So, you will be able to have several messages, each of which can have multiple attachments. But also, to save space on our sever since my use case will have users uploading many of the same file, different attachments can reference the same file.

My question is, should the Message table contain some kind of RecipientsID? Or is it enough to have my Recipient table reference MessageID?

The same question for AttachmentsID on the Message table. Should I have an some sort of AttachmentsID? Or is it enough that the Attachment table references the MessageID.

Is it ok for Message to be not have any reference to its Attachments or Recipients, if both Attachments and Recipients know which Message they belong to? Or should I be doing it another way?

I'm curious to hear how some veteran SQL guys would lay this schema out.


Edit: I'm looking to have multiple recipients and multiple attachments, per message. I'm sorry if that wasn't clear.

It is in these one-to-many relationships that I'm struggling with understanding if I'm doing it the best way.

A: 

All of your questions depend on your specific business rules. Can a message have more than one recipient? If so, then you can't store the recipient ID in the messages table because that would only allow you to store one recipient per message. Think through this logic for each of your situations and it will hopefully become clearer.

The standard ways to model relationships in an RDBMS are:

1-to-many : The "many" table has the PK for the "1" table in it. For example, one order can have many order lines, so each order line row will have an order_id

many-to-many : A "linking" table exists between the two main tables, which contains the PKs for both of the main tables. These combined PKs often make up the PK for the linking table. For example, in most situations a message can be sent to multiple users and a user could have more than one message sent to them. In this case you have a many-to-many relationship, so you would have a Users table (user_id, name, etc.), a Message table (message_id, message_body, etc.) and a Message_Recipients table (message_id, user_id).

1-to-1 : This is similar to subclassing from an OO perspective. I might have Buildings in my database, which tracks certain data, then in addition to that data some buildings might also be Houses, which track additional data. IN this case, the two tables share the same PK.

I'm not going to go into hierarchies here, since they can be modeled several different ways and the best model often depends on specific factors of the system.

Tom H.
Yes, you can have multiple recipients and multiple attachments. Those are the 2 cases that my questions revolve around and that I'm trying to get answers for. When I have a one-to-many relationship, how should I handle it?
KingNestor
And I realize you can't have the RecipientID in the Message table in that case, but I've often seen people use some sort of mapping table and use the ID from that table as a "Recipient(s)ID", which themselves point to the various recipients.
KingNestor
I've added a bit to my answer which I hope will help
Tom H.
A: 

You can just drop the Recipients table all together. It's redundant, because the RecipientID in the Message table holds that value. Unless you wanna have more than one Recipient, then it's necessary to do it the other way.

About the attachments, it's best that the attachment table refers to the message table, and not the other way around. If the message table has an attachment Id, that limits it to one attachment per message, which is probably nice, but could limit things if you want to expand it to allow several attachments.

On the other hand, having only one attachment lets you get the attachment ID together with the message, and you can join the query rows so you get it all in one query. Saves some lines of code.

So to summarize, the 'minimalistic' way is to have one recipient and one attachment, in which case you drop the Recipient table and the messageId in the attachment table. The most expansive way is to have multiple recipients and attachments, in which case you drop the recipientId and attachmentId in the message table.

Tor Valamo
I'm looking to have multiple recipients and multiple attachments. I'll try to clarify that in my question.
KingNestor
I edited the text just when you posted. :)
Tor Valamo
A: 

Depending on how you're planning on storing the files, you will need to account for the uploaded file names. I know on Windows there's a maximum path length for accessing a file (where the path includes the full file name and extension). So you may want to do something like give the files an arbitrary name and store the actual file name in the File table. You may also want to account for the MIME type of the uploaded file so you can supply it back through the website when the user wants to view the document. Either read it based on extension or something similar and store it, or just look it up when the website presents the file to the user for download.

Agent_9191
A: 
Message
---------
msg_id (PK)
sender_id (FK to users_id)
metadata...

Users
---------
user_id (PK)
address (How to locate the user for routing purposes)
metadata....

Attachments
----------
attachment_id (PK)
md5 (possibly UNIQUE, but beware of collisions)
file_sys_ref (a way to find the attachment file in the file system)
meta_data...

Recipients
----------
message_id (FK -> Messages)
user_id (FK -> Users)
meta_data...
PRIMARY KEY (message_id, user_id)

I'd store the file in the filesytem, rather then as a BLOB in the database, but that's just me. I find that it's easier to transfer it through an external file transfer mechanism (ftp, scp, HTTP POST, etc...) then write your own to chunk it into the db.

James
+1  A: 

Bunch of good answers here, but let me be more direct:

should the Message table contain some kind of RecipientsID?

No.

Or is it enough to have my Recipient table reference MessageID?

Yes.

The same question for AttachmentsID on the Message table. Should I have an some sort of AttachmentsID?

No.

Or is it enough that the Attachment table references the MessageID?

Yes.

Is it ok for Message to be not have any reference to its Attachments or Recipients, if both Attachments and Recipients know which Message they belong to?

Yes.

filiprem