I run a small record label and we have a bunch of audio files stored on Amazon's S3. We want them converted to MP3's with a standard bitrate. I read about the NYTimes converting all their PDF's using EC2 and since I'm a nerdy web programmer, I'm intrigued. Instead of downloading all the files and converting them by hand, I'm wondering what it takes to set up an EC2 instance and get it set up to convert files? I want to be able to control it from my web server with PHP, so is the approach to create a virtual LAMP stack and install the LAME encoder? I really have no idea. Thanks!
If you want to convert your audio files (I'm assuming .wav since it's a pretty common format pre-format conversion) to mp3 LAME is a solid encoder.
A full blown LAMP stack is highly unnecessary for using LAME, a simple shell script will suffice.
This will convert all *.wav files in the current directory to .mp3 files if they do not have a converted copy already in-place (LAME doesn't care about clobbering output files).
#!/bin/bash
for file in *.wav; do
dest="${file%wav}mp3"
if [[ -e "$file" ]] && [[ ! -e "$dest" ]]; then
lame "$file" "$dest"
fi
done
You will want to look through man lame
for the conversion options specific to your VBR/CBR/ABR (variable, constant and average bitrate) needs.
While the above answer would work if you already had the files in the local EC2, you'll have to fetch each song from S3 into EC2, either into a pipe for conversion or into a temporary file, then either pipe it back up to S3 or store it in a temp file and then send it back to EC2.
Haven't actually used EC2, so not sure what kind of storage you're working with, but you should have plenty of space to store the one temporary mp3.
You would probably also want to create some way of tracking status, probably by doing a listing on your bucket before you start.
Probably a perl script using the S3 module would be more suited, but I'm too lazy to type that all in here :).
You could use Elastic MapReduce for this. Although you'd have to play around a bit to get it to spit out separate files as output.