views:

135

answers:

5

To make a long story short, one part of the application I'm working on needs to store a somewhat large volume of data in a database, for another part of the application to pick up later on. Normally this would be < 2000 rows, but can occasionally exceed 300,000 rows. The data needs to be temporarily stored and can be deleted afterwards.

I've been playing around with various ideas and one thing came to mind today. The LONGTEXT datatype can store a maximum of 2^32 bytes, which equates to 4 GB. Now, that's a lot of stuff to cram into one table row. Mind you, the data would probably not exceed 60-80 MB at the very most. But my question is, is it a good idea to actually do that?

The two solutions I'm currently toying with using are something like this:

  • Inserting all data as individual rows into a "temporary" table that would be truncated after finish.
  • Inserting all data as a serialized string into a LONGTEXT column in a row that would be deleted after finish.

Purely from a performance perspective, would it be better to store the data as potentially >300,000 individual rows, or as a 60 MB LONGTEXT entry?

If it's a wash, I will probably go with the LONGTEXT option, as it would make the part of the application that picks up the data easier to write. It would also tie in better with yet another part, which would increase the overall performance of the application.

I would appreciate any thoughts on this.

A: 

What kind of data and how it will be used? Probably it will be much better to store and process it in memory of your application. At least, it will be much faster and will not load DB engine.

I would if I could. Part of the problem is that the module that picks up the data may not be on the same server. Hence, I need to find a centralized way of storing the data where it can be accessed from any of the servers. I can't get away from using a database, I'm afraid.
vonconrad
+2  A: 

Serializing all that data into a LONGTEXT... blasphemy!! :)

Seriously though, it occurs to me that if you do that, you would have no choice than to extract it all in one, giant, piece. If you spread it into individual rows, on the other hand, you can have your front-end fetch it in smaller batches.

At least giving yourself that option seems the smart thing to do. (Keep in mind that underestimating the future size requirements of once data can be a fatal error!)

And if you design your tables right, I doubt very much that 60MiB of data spread over 300.000 rows would be any less efficient than fetching 60MiB of text and parsing that on the front-end.

Ultimately the question is: do you think your front-end can parse the text more efficiently than MySQL can fetch it?

Atli
+1  A: 

This should be fine as long as you use a memory storage engine. In MySQL, this means using the MEMORY storage engine instead of InnoDB or MyISAM. Otherwise disk usage will bring your app to its knees.

George Edison
A: 

You could always store it in the database as the 300,000 row format and use memcached to cache the data so you don't have to do it again. Please note that memcached stores it in the memory of the machine so if your using a lot of this data you may way to set a low expire on it. But memcached significantly speeds up the time to fetch data because you dont have to do queries every page load.

Tom Schlick
A: 

If you're going to just be writing a large, temporary BLOB you might consider writing to a temporary file on a shared file system instead.

jewel