views:

380

answers:

4

I'm building a web-app that handles internal emails and other frequent small-to-medium sized chunks of text between users and clients. What's the best method for storing this data? In a database (MySQL) or as thousands of individual files? What about compressing it (PHP's gzcompress() or MySQL's compression features)?

This will not be a public application, so the user load will be minimal (less than 20 users at a time). However, there will be a lot of communication going back-and-forth every day within the app, so I expect the amount of data to grow quite large as time goes by (which is why I'd like to compress it).

I'd like to keep the data in a database for ease of access and portability, but some of the threads I've seen on here regarding images have suggested using file storage. What do you think?

Thank you, Seth

Edit for clarification: I do not require any sort of searching of the text, which is why I would lean toward compressing it to save on space.

+1  A: 

For images and documents that are already in a specific format (excel, word documents, pdf files, etc) I prefer file storage. But for just raw text I would probably rather use a database. It is easier to replicate across machines for failover, you can do substring searches over the text and although I don't know of a specific algorithm to use to compress it, I would think that a database would be a better way to go. But only if you already have just the text and it is only text. Any other format of document I would prefer using file storage.

And unless I am missing something I would use a CLOB instead of a BLOB, if it is only text.

Ryan Guill
A: 

I would have chosen to use a DB. You describe a scenario where you are going to store a large quantity of messages. You do not provide much information about the system, but i would guess that you probably would like to sort, group and apply several other properties to the messages. It would be much easier and probably faster to keep the message with its attributes in a DB instead of using file storage.

When it comes to compression I do not know which of the methods is most effective. You should probably try both before choosing.

maskefjes
+1  A: 

One of the main reasons for keeping the files in a database is to keep it consistent with the rest of the data that you are storing. It will be easier to make backups, (re)deploy with predefined datasets etc. Furthermore it's easier to guarantee transactional integrity.

One of the benefits of storing text as files could be that it is easier to serve them using a webserver, if this is the only remaining benefit of using files you could look into caching the files on the webserver -- that will give you much of the easy backup and transactions of the database but at the same time allow some speedup for http requests.

Simon Groenewolt
A: 

I wonder how big is this "medium chunk". If the text is just written messages (so less than 10 KB), then compressing makes them even smaller and there wouldn't be big impact on database growth. It makes developing and maintenance also much easier to have everything available with singl query and not having to get the file contents separately.

Riho