tags:

views:

507

answers:

4

I have an array with 1440 vars in it. something like:

$array = array(1 => 45.76, 2 => 67.56, 3 = 79.23 ..... 1440 => 20.22);

This array is serialized and inserted into MySQL table with longtext as field type.

What I'm facing now is a table with 700MB of data and that will mean some serious problems in the near future.

Is there a better way of storing this array, I've heard of php pack() function, but how will it help me? What about mysql BLOB type filed instead of longtext?

Thank you

+1  A: 

Take a look at gzdeflate for compressing strings. Example from the doc:

<?php
$compressed = gzdeflate('Compress me', 9);
echo $compressed;
?>

so in your case, something like this might do:

<?php
$compressed = gzdeflate(serialize($array), 9); //or var_export?
echo $compressed;
?>
karim79
+4  A: 

Don't store arrays this way. The array syntax will give huge overhead.

It's better to use a row for every entry, like so:

TABLE `array_entries` (
    `Index` INT UNSIGNED NOT NULL,
    `Value` DOUBLE,
    `Name` CHAR(8) COMMENT 'name of the array the entry belongs to',
    PRIMARY KEY (`Name`, `Index`)
);

will allow you to store multiple arrays in one table, and only load what you need (or a whole array if you like).

Of course, you can replace Name by a (integer) foreign key pointing to a table arrays containing more info about the arrays.

Martijn
You should also define a column to join to the `arras` table, and an index on that column, and an index on whichever column is sorted (probably `index`). What is the `name` column for?
dcrosta
Maybe use DECIMAL(2,2) as the column type if all the values are in the XX.XX format.
James Hall
I don't think this would be a good solution, this would mean I'd have 8 million records now and every day I will get 1500000 more.
dfilkovi
A: 

On modern hardware, 700MB of data in a MySQL table doesn't look so bad. Of course, it's always better to minimize the tables size, so you could use one of the compression functions that PHP provides. Read on here for more informations.

I often use gzdeflate, because it's quite fast and compresses text pretty well. But plenty others are available.

Nicolas
+1  A: 

More information might be helpful:

  • Where are you getting this data from?
  • Is it actually numeric, or is that just simplification for your example? If numeric, what range/accuracy is required? Is it discrete data, ie from a limited selection of recurring values? Does the data have to be exact, or would a close approximation still be good enough?
  • What does it mean? How are you using it?
  • Are you most interested in speed, or storage space, or code cleanliness, or what?
  • How often does it change?
  • Is that 700Mb all one array? Are your PHP instances memory-limited? Have you looked at how much memory your script uses, and how that varies with different array sizes?
  • What is the largest single array you anticipate ever seeing (in terms of number of items / storage space)?
  • How many different data arrays might you have to deal with?
  • Is there any "overlap" between data sets, ie recurring data?

Taking your example as representative, I would consider multiplying each number by 100 and casting to integer then storing as raw binary. Your array becomes 2880 bytes of data. If your data rarely or never changes, you could skip-read this directly from a binary file very efficiently.

Hugh Bothwell
This data represents minutes in a day (1440) and a decimal value with 999.99 highest posible for a certain person. So in a table there are many persons with their every day minute values entry. Data changes every 5 minutes and it is needed on user request.
dfilkovi
So every 5 minutes for every person you are storing 5 new values? And at any time a person can request their last day's worth of data? What is the meaning of the value - heart-rate?Do you ever store more than a day at a time, or is this like a circular queue ie there will always be exactly 1440 values per person?So far I have to agree with Martijn - normalizing to a database table is probably the most effective solution for dealing with this. If storing multiple days's data it might be worth having a secondary archive table and copying/removing from the first table every couple of hours.
Hugh Bothwell