views:

85

answers:

3

Hello, I have application which retrieves many large log files from a system LAN.

Currently I put all log files on Postgresql, the table has a column type TEXT and I don't plan any search on this text column because I use another external process which nightly retrieves all files and scans for sensitive pattern.

So the column value could be also a BLOB or a CLOB, but now my question is the following, the database has already its compression system, but could I improve this compression manually like with common compressor utilities? And above all WHAT IF I manually pre-compress the large file and then I put as binary into the data table, is it unuseful as database system provides its internal compression?

+1  A: 

My guess here is that if you do not need any searching or querying ability here that you could gain a reduction in disk usage by zipping the file and then just storing the binary data directly in the database.

Mitchel Sellers
+2  A: 

I don't know who would compress the data more efficiently, you or the db, depends on the algo used etc. But what is sure is that if you compress it, asking the db to compress it again will be a waste of CPU. Once compressed, trying to compress it again yields less gain each time until you end up consuming more space eventually.

raticulin
It's not merely a waste of CPU, it also makes the application stack more complex (everything needs to know how to get the plain text to and from the specialised compression you've applied to the field) and fragile (more code means more bugs). With little probability of a significant improvement, that would be a poor option to follow.
bignose
+1  A: 

The internal compression used in PostgreSQL is designed to err on the side of speed, particularly for decompression. Thus, if you don't actually need that, you will be able to reach higher compression ratios if you compress it in your application.

Note also that if the database does the compression, the data will travel between the database and the application server in uncompressed format - which may or may not be a problem depending on your network.

As others have mentioned, if you do this, be sure to turn off the builtin compression, or you're wasting cycles.

The question you need to ask yourself is do you really need more compression than the database provides, and can you spare the CPU cycles for this on your application server. The only way to find out how much more compression you can get on your data is to try it out. Unless there's a substantial gain, don't bother with it.

Magnus Hagander