views:

698

answers:

5

I im trying to store 30 second user mp3 recordings as Blobs in my app engine data store. However, in order to enable this feature (App Engine has a 1MB limit per upload) and to keep the costs down I would like to compress the file before upload and decompress the file every time it is requested. How would you suggest I accomplish this (It can happen in the background by the way via a task queue but an efficient solution is always good)

Based on my own tests and research - I see two possible approaches to accomplish this

  • Zlib

For this I need to compress a certain number of blocks at a time using a While loop. However, App Engine doesnt allow you to write to the file system. I thought about using a Temporary File to accomplish this but I havent had luck with this approach when trying to decompress the content from a Temporary File

  • Gzip

From reading around the web, it appears that the app engine url fetch function requests content gzipped already and then decompresses it. Is there a way to stop the function from decompressing the content so that I can just put it in the datastore in gzipped format and then decompress it when I need to play it back to a user on demand?

Let me know how you would suggest using zlib or gzip or some other solution to accmoplish this. Thanks

A: 

As Aneto mentions in a comment, you will not be able to compress MP3 data with a standard compression library like gzip or zlib. However, you could reencode the MP3 at a MUCH lower bitrate, possible with LAME.

John Paulett
+2  A: 

"Compressing before upload" implies doing it in the user's browser -- but no text in your question addresses that! It seems to be about compression in your GAE app, where of course the data will only be after the upload. You could do it with a Firefox extension (or other browsers' equivalents), if you can develop those and convince your users to install them, but that has nothing much to do with GAE!-) Not to mention that, as @RageZ's comment mentions, MP3 is, essentially, already compressed, so there's little or nothing to gain (though maybe you could, again with a browser extension for the user, reduce the MP3's bit rate and thus the file's dimension, that could impact the audio quality, depending on your intended use for those audio files).

So, overall, I have to second @jldupont's suggestion (also in a comment) -- use a different server for storage of large files (S3, Amazon's offering, is surely a possibility though not the only one).

Alex Martelli
+1  A: 

While the technical limitations (mentioned in other answers) of compressing MP3 files via standard compression or reencoding at a lower bitrate are correct, your aim is to store 30 seconds of MP3 encoded data. Assuming that you can enforce that on your users, you should be alright without applying additional compression techniques if the MP3 bitrate is 256kbit constant bitrate (CBR) or lower. At 256kbit CBR, 30 seconds of audio would require:

(((256 * 1000) / 8) * 30) / 1048576 = 0.91MB

The maximum standard bitrate is 320kbit which equates to 1.14MB, so you'd have to use 256 or less. The most commonly used bitrate in the wild is 128kbits.

There are additional overheads that will increase the final file size such as ID3 tags and framing, but you should be OK. If not, drop down to 224kbits as your maximum (30 secs = 0.80MB). There are other complexities such as variable bit rate encoding for which the file size is not so predictable and I am ignoring these.

So your problem is no longer how to compress MP3 files, but how to ensure that your users are aware that they can not upload more than 30 seconds encoded at 256kbits CBR, and how to enforce that policy.

mhawke
A: 

You can store up to 10Mb with a list of Blobs, search for: google file service It's much more versatile than BlobStore in my opinion since i just started using BlobStore Api yesterday and im still figuring if it is possible to access the data bytewise.. as in changing doc to pdf, jpeg to gif..

so you can storage Blobs of 1Mb * 10 = 10 Mb (max entity size i think) or you can use BlobStore API and get the same 10Mb or get 50Mb if you enable billing (you can enable it but if you don't pass the free quota you don't pay)

bimbojones