I'm working on a project where a user can submit a link to a sound file hosted on another site through a form. I'd like to download that file to my server and make it available for streaming. I might have to upload it to Amazon S3. I'm doing this in Django but I'm new to Python. Can anyone point me in the right direction for how to do this?
views:
85answers:
1Here's how I would do it:
Create a model like
SoundUpload
like:class SoundUpload(models.Model): STATUS_CHOICES = ( (0, 'Unprocessed'), (1, 'Ready'), (2, 'Bad File'), ) uploaded_by = models.ForeignKey(User) original_url = models.URLField(verify_true=False) download_url = models.URLField(null=True, blank=True) status = models.IntegerField(choices=STATUS_CHOICES, default=0)
Next create the view w/a
ModelForm
and save the info to the database.Hook up a post-save signal on the
SoundUpload
model that kicks of a django-celery Task. This will ensure that the UI responds while you're processing all the data.def process_new_sound_upload(sender, **kwargs): # Bury to prevent circular dependency issues. from your_project.tasks import ProcessSoundUploadTask if kwargs.get('created', False): instance = kwargs.get('instance') ProcessSoundUploadTask.delay(instance.id) post_save.connect(process_new_sound_upload, sender=SoundUpload)
In the
ProcessSoundUploadTask
task you'll want to:- Lookup the model object based on the passed in id.
- Using
pycurl
download the file to a temporary folder (w/very limitied permissions). Use
ffmpeg
(or similar) to ensure it's a real sound file. Do any other virus style checks here (depends on how much you trust your users). If it turn out to be a bad file set theSoundUpload
.status field to2
(Bad File), save it, and return to stop processing the task. Perhaps send out an email here.- Update the
SoundUpload
.download_url to be the s3 url, the status to be "processed" and save the object. - Do any other post-processing (sending notification emails, etc.)
The key to this approach is using django-celery
. Once the task is kicked off through the post_save signal the UI can return, thus creating a very "snappy" experience. This task gets put onto an AMQP message queue that can be processed by multiple workers (dedicated EC2 instances, etc.), so you'll be able to scale without too much trouble. This may seem like a bit overkill, but it's really not as much work as it seems.