I have a large collection of data chunks sized 1kB (in the order of several hundred million), and need a way to store and query these data chunks. The data chunks are added, but never deleted or updated. Our service is deployed on the S3, EC2 platform.
I know Amazon SimpleDB exists, but I want a solution that is platform agnostic (in case we need to move out of AWS for example).
So my question is, what are the pro's and con's of these two options for storing and retrieving data chunks. How would the performance compare?
- Store the data chunks as files on S3 and GET them when needed
- Store the data chunks on a MySQL Server cluster
Would there be that much of a performance difference?