views:

148

answers:

3

I'm storing time-temperature data in a database, which is really just CSV data. The first column is time in seconds, starting at zero, with the following (one or more) column(s) being temperature:

0,197.5,202.4 
1,196.0,201.5
2,194.0,206.5 
3,192.0,208.1 ....etc

Each plot represents about 2000 seconds. Currently I'm compressing the data before storing it in a output_profile longtext field.

CREATE TABLE `outputprofiles` (
  `id` int(11) NOT NULL auto_increment,
  `output_profile` longtext NOT NULL,
PRIMARY KEY  (`id`)

This helps quite a bit...I can compress a plot which is 10K of plain text down to about 2.5K. There is no searching or indexing needed on this data since it's just referenced in another table.

My question: Is there any other way to store this data I'm not thinking about which is more efficient in terms of storage space?

+2  A: 

Is there any reason to think that storage space will be a limiting constraint on your application? I'd try to be pretty sure that's the case before putting a higher priority on that, compared to ease of access and usage; for which purpose it sounds like what you have is satisfactory.

le dorfier
As my grand-pappy used to say, disk space is cheap
Tom H.
Probably another case of me trying to optimize prematurely. But, I wanted to see if there was something else that I was just completely missing or not thinking about. Thanks.
brianz
+1  A: 

I actually do not understand quite well what you mean with "compressing the plot". Means that, that you are compressing 2000 measurements or you are compressing each line?

Anyway, space is cheap. I would make it in the traditional way, i.e. two columns, one entry for each measurements.

If for some reason this doesn't work and if you want to save 2000 measurements as one record then you can do it pretty much better.

. Create a csv file with your measurements. . zip it (gzip -9 gives you the maximal compression) . save it as a blob (or longblob depending the DB you are using) NOT as a longtext

Then just save it at the DB.

This will give you maximal compression.

Luixv
A: 

PostgreSQL has a big storage space overhead since every tuple (a prepresentation of a row in a table) is 28 byte excluding the data (PostgreSQL 8.3). There are 2, 4 and 8 bytes integers and a timestamp is 8 byte. Floats are 8 bytes I think. So, storing 1,000,000,000 rows in PostgreSQL will require several GiB more storage than MySQL (depending on which storage enginge you use in MySQL). But PostgreSQL is also great at handling huge data compared to MySQL. Try do run some DDL queries to a huge MySQL table and you'll see what I mean. But this simple data you are storing should probably be easy to partition heavily, so maby a simple MySQL can handle the job nicely. But, as I always says, if you're not really really sure you need a specific MySQL feature you should go for PostgreSQL.

I'm limiting this post to only MySQL and PostgreSQL since this question is tagged with only those two databases.

Edit: Sorry, I didn't see that you actually stores the CSV in the DB.

Martin Torhage