views:

236

answers:

6

So the application we've got calls the API's of all the major carriers (UPS, FedEx, etc) for tracking data.

We save the most recent version of the XML feed we get from them in a TEXT field in a table in our database.

We really hardly ever (read, never so far) access that data, but have it "just in case."

It adds quite a bit of additional weight to the database. Right now a 200,000 row table is coming in at around 500MB...the large majority of which is compromised of all that XML data.

So is there a more efficient way to store all that XML data? I had thought about saving them as actual text/xml files, but we update the data every couple of hours, so wasn't sure if it would make sense to do that.

+3  A: 

if its really unused, try:

/dev/null

JT.WK
+1 Makes sense :) I don't think it meets the "just in case" requirement. Not sure why, but I always have trouble getting data back out of /dev/null...
Dan McGrath
+2  A: 

I don't know what kind of data these XML streams contain, but maybe you can parse it and store only the pertinent info in a table or set of tables that way you can eliminate some of the XML's bloat.

Chris Kloberdanz
+9  A: 

Assuming it's data there's no particular reason not to keep it in your database (unless it's impeding your backups). But it would be a good idea to keep it in a separate table from the actual data that you do need to read on a regular basis — just the XML, a FK back to the original table, and possibly an autonumbered PK column.

Larry Lustig
A: 

As a DATAbase is designed to store DATA this seems to be the logical place for it. A couple of suggestions:

  1. Rather than storing it in a seperate table is to use a seperate database. If the information isn't critical

  2. Have a look athe the compress and uncompress functions as this could reduce the size of the verbose XML.

I worked on one project where we split data between the database and file system. After this experience I vowed never again. Backups and maintenance of various production/test/dev environments turned into a nightmare.

Dave Barker
Filesystem does not have data in its name, but as far as I known it is possible to store some data there.
Thomas Jung
I worked on one project where we split data between the database and file system. After this experience I vowed never again. Backups and maintenance of various production/test/dev environments turned into a nightmare. Admittedly, I don't have a problem with a filesystem only solution but this doesn't seem to fit this problem.
Dave Barker
A: 

Learn about OLAP techniques and data warehouses. They are probably what are you looking for.

doc
+3  A: 

It has been my experience that the biggest trouble with TEXT/BLOB columns that are consistently large, is that people are not careful to prevent reading them when scanning many rows. On MyISAM, this will waste your VFS cache, and on InnoDB, it will waste your InnoDB buffer pool.

A secondary problem is that as tables get bigger, they become harder to maintain.. adding a column or index can rebuild the whole table, and a 500MB table rebuilds a lot slower than a 5MB table.

I've had good success moving things like this off into offline key/value storage such as MogileFS, and/or TokyoTyrant.

If you don't need to be crazy scalable, or you must value transactional consistency over performance, then simply moving this column into another table with a 1:1 relationship with the original table will at least require a join to blow out the buffer pool, and allow you to maintain the original table w/o having to tip-toe around the 500MB gorilla.