views:

107

answers:

6

At the risk of sounding foolish, in a scenario where large data fields need to be persisted (such as with blog posts), is database storage always the best solution?

I'm guessing bloating the database is probably not too high a risk, as thats kind of what databases are meant to be good at, right? Also databases can be good for text indexing and fast access. Is that assumption correct?

It occurs to me that that kind of data could be stored outside of the database in some kind of xml flat file, I'm not sure that's a good idea...

A: 

Database is much better than xml flat file to save as TEXT. It also has the advantages of handling concurrency and transactions.

waqasahmed
+2  A: 

Storing text inside a database, including things like blog posts, is something often done. There are database to handle this.

It's also common to store large content (eg images, large text files, etc) outside the database (ie in the filesystem) and reference them from the database. Doing this may limit your database size but presents other problems such as handling concurrency issues (like editing the file at the same time).

Lots of factors come into play to determine which is the most appropriate solution, including how often things are edited, how large the files are, how many files you have and so on.

As for database handling of text indexing, support varies. MySQL (using MyISAM storage) has full-text searching for example. SQL Server with the right add-on has it too. Same with Oracle. It can be useful but is more limited than a general-purpose search engine (eg Lucerne). Your requirements and constraints will determine if database indexing is sufficient or you need a search engine type solution.

To give you a real and specific example, the StackOverflow search is implemented using SQL Server full text searching and many have criticized it for being ineffective compared to using Google's "site:stackoverflow.com ...." (which I use by default pretty much).

cletus
A: 

If you're at all concerned with performance and reliability, you should seriously consider using a database that meets your requirements. The developers of those systems have focused lots of time into solving all the problems that you will need to re-solve if you try to use a flat file of some sort.

John Fisher
+2  A: 

Your assumptions are correct. You really don't want to store that text outside of the database, because you'll lose:

  • Transaction safety
  • Searching capabilities (which could be added via a different tool, bringing its own set of problems/requirements)
  • Ease of maintenance
  • Consistency (what if somebody deletes the xml file)

Additionally, while a similar topic has been beaten to death with respect to images (should one store images on the DB or in the filesystem?), text does not raise the same level of concern, because 'large' texts in fact are quite small (10KB or 100KB as a huge upper limit) and most databases have a special datatype to store, well, text. With images there is place to debate because we would be talking about data in the (several) megabyte(s) range.

cletus raises interesing considerations, the most relevant IMO being that usually database full text engines perform worse than dedicated search engines (like Lucene and friends). This will have to be considered in light of the potential problems and the actual usage you will have with your data. Also, there are some database searching modules which perform better than others, so this would have to be tested in your particular scenario.

Vinko Vrsalovic
+1  A: 

DasBlog uses XML to store the text from the blog entries, but I understand that there are some scaling issues with this.

Robert Harvey
+1  A: 

It depends on the RDBMS to some extent.

In SQL Server (prior to version 2008), the advice (gained from benchmarking), if less than 256K put in database, if greater than 1MB put in file system (with grey area in between).

Ref: To BLOB or Not To BLOB:Large Object Storage in a Database or a Filesystem?

Mitch Wheat
But what did you benchmark? And how did you avoid consistency problems?
Vinko Vrsalovic
I didn't, MS Research did: http://research.microsoft.com/pubs/64525/tr-2006-45.pdf
Mitch Wheat