views:

371

answers:

2

What datatype is the best to store html content and documents like pdf's, excel files, and word documents. I am currently using ntext, but I am not sure if this is the best datatype to store html content.

Also, I currently use FCKEditor to save html content to an ntext field and it stores the markup along with the styles. If I happen to read the part of the description into an ItemTemplate of a ListView with the following:

<%# Eval("content") %>

It displays the content, but it displays the markup as well. For example, instead of displaying:

"This is an html string", it would display <p>This is and html <b>string</b></p>

A: 
  • Store Unicode HTML as ntext
  • If you're 110% sure you'll only need ascii, you can use text or varchar.
  • Store binaries (pdf, word, excel) as binary blobs
Chris Ballance
When you say text, do you mean as opposed to ntext? What if I the html will use multiple languages?
Xaisoft
Use ntext or nvarchar if you ever anticipate the need for Unicode support. Otherwise, ASCII will take up significantly less space.
Chris Ballance
This is true, but remember that HTML can often require Unicode. In SQL Server 2008, Unicode compression is very smart about reducing storage space for the characters in there that *aren't* outside the ASCII set. I did some evaluations here (should be the top 3 hits):http://is.gd/4rJ3cI am really looking forward to this.
Aaron Bertrand
+4  A: 

I would store the HTML in NVARCHAR(MAX) (or less if you don't need more than 4000 characters). Don't use the TEXT/NTEXT data type unless you are stuck in SQL Server 2000. They're deprecated and the benefits of the MAX types make their usage foolhardy IMHO.

I would investigate storing the files as FILESTREAM in SQL Server 2008. If you are < 2008, or you have investigated the trade-offs and FILESTREAM is not an option, then my personal preference is to store files on the file system, and a local and/or http reference to the file in the database.

The benefit of storing the files in the database is that you get transactional consistency. The downside is that you use much more database space, which is usually on more expensive storage. You also have a much harder time debugging (you can't just say "SELECT PDFFile FROM Table" in Management Studio, and expect Acrobat Reader to pop up and show your column's contents).

Since at work I have to beg and sign over the rights to my unborn children for more SAN allocation, and at play I get charged a lot more for my database space usage than my file space usage, I opt for file-based storage every time. At work our file system is backed up and, while it is not 100% in sync with the database backups, we have seen so few file corruption issues that weren't the fault of the end user (and easily corrected without our involvement) - zero, in fact - that it doesn't seem worth it to investigate.

Like I said, personal preference. But ask the right questions and understand your goals and limitations.

Aaron Bertrand
Blast, Aaron! I was just typing a similar response :) Stu
Stuart Ainsworth
I definitely need more than 400 characters and I don't have sql server 2005 although that might change soon. Storing them on the hard drive and then just storing the link to the html document in a data field is not a bad option.
Xaisoft
With the FileStream object, can I store pdf, excel, word, etc documents as well. I could also store these on the file system as well, correct?
Xaisoft
Sorry, that was supposed to say 4000, not 400. My typewriter thinks that it doesn't have to work so hard late at night.
Aaron Bertrand
FileStream is a very large topic and it requires extensive understanding and planning. In short, yes, you can store any type of document you can store in the file system, and you can allocate your filestream storage space anywhere on the file system (there are restrictions, as you can imagine). I would read up some more on FileStream if you are using SQL Server 2008 and really think this might be a benefit: http://msdn.microsoft.com/en-us/library/bb895234.aspx / http://msdn.microsoft.com/en-us/library/bb895234.aspx
Aaron Bertrand
lol, it is ok, I appreciate the help. You have given me yet another reason to upgrade to sql server 2008 now with the addition of the filestream object.
Xaisoft
So as it stands without filestream, I could allow the user to store html, pdf, word, excel, powerpoint, etc in a folder on the server called documents and then have a field in the database point to the location of these files. Does that sound about right?
Xaisoft
Well, I would ensure that your application is responsible for accepting a file upload, and placing it where you tell the database is stored. If you let users have direct access to the storage location, you are *guaranteeing* that the files will quickly fall out of sync with what the database believes.
Aaron Bertrand
Can you elaborate on what you mean by "fall out of sync of what the database believes"
Xaisoft
Sure, if the user tells the database, "I am going to put foo.pdf in the documents folder" and then you rely on the user to actually do that, they could put it somewhere else, or forget, or name it differently, or move it later...
Aaron Bertrand
I see what you mean, so I always have to keep it in sync.
Xaisoft
You don't necessarily have to "keep it in sync" manually. Just have the application written so that it controls any operations that are required - upload/download/delete/rename operations.
Aaron Bertrand
(Then the users don't need access to the folder directly, and can't "surprise" you or the database.)
Aaron Bertrand