ansaurus

Question

How would I go about downloading a file from a submitted link then reuploading to my server for streaming?

Answer 1

A:

Here's how I would do it:

Create a model like SoundUpload like:

class SoundUpload(models.Model):
    STATUS_CHOICES = (
        (0, 'Unprocessed'),
        (1, 'Ready'),
        (2, 'Bad File'),
    )
    uploaded_by = models.ForeignKey(User)
    original_url = models.URLField(verify_true=False)
    download_url = models.URLField(null=True, blank=True)
    status = models.IntegerField(choices=STATUS_CHOICES, default=0)

Next create the view w/a ModelForm and save the info to the database.

Hook up a post-save signal on the SoundUpload model that kicks of a django-celery Task. This will ensure that the UI responds while you're processing all the data.

def process_new_sound_upload(sender, **kwargs):
   # Bury to prevent circular dependency issues.
   from your_project.tasks import ProcessSoundUploadTask
   if kwargs.get('created', False):
        instance = kwargs.get('instance')
        ProcessSoundUploadTask.delay(instance.id)


post_save.connect(process_new_sound_upload, sender=SoundUpload)

In the ProcessSoundUploadTask task you'll want to:
- Lookup the model object based on the passed in id.
- Using pycurl download the file to a temporary folder (w/very limitied permissions).
- Use ffmpeg (or similar) to ensure it's a real sound file. Do any other virus style checks here (depends on how much you trust your users). If it turn out to be a bad file set the SoundUpload.status field to 2 (Bad File), save it, and return to stop processing the task. Perhaps send out an email here.
- Use boto to upload the file to s3. See this example.
- Update the SoundUpload.download_url to be the s3 url, the status to be "processed" and save the object.
- Do any other post-processing (sending notification emails, etc.)

The key to this approach is using django-celery. Once the task is kicked off through the post_save signal the UI can return, thus creating a very "snappy" experience. This task gets put onto an AMQP message queue that can be processed by multiple workers (dedicated EC2 instances, etc.), so you'll be able to scale without too much trouble. This may seem like a bit overkill, but it's really not as much work as it seems.

sdolan 2010-09-10 20:42:41

Thank you for a very detailed answer. I will try this approach.

knuckfubuck 2010-09-11 01:50:26

@knuckfubuck: You're welcome. I'm happy to answer any issues you may run into as you develop this, just make sure you mark your comments w/my name, so I'll get notified.

sdolan 2010-09-11 20:40:18

@sdolan: I tried adding a ChoiceField for status like in your example but I'm getting an error that there is no ChoiceField in models. I did some searching but can't figure out how to fix this. Can you help?

knuckfubuck 2010-10-04 06:38:54

@knuckfubuck: Sorry, it's IntegerField (ChoiceField is a forms Field, not models Field). I've updated my answer.

sdolan 2010-10-04 07:38:24

@sdolan: OK, I thought it might need to be a CharField or IntegerField. Thanks. Now when I'm hooking up the post_save, which I'm putting under the SoundUpload model, it is telling me SoundUpload is not defined. I tried using 'self' as well but get the same error.

knuckfubuck 2010-10-04 07:57:02

@sdolan: Nevermind that last one I had the code nested incorrectly.

knuckfubuck 2010-10-04 08:05:13

@sdolan: Got Celery w/ RabbitMQ working and I'm trying to write a task now but I'm having problems importing the models from my app into the tasks.py file. Is there something special I need to do for that file?

knuckfubuck 2010-10-06 06:31:51

@knuckfubuck: Perhaps it's circular dependency problem? Try burying your `from your_project.tasks import ProcessSoundUploadTask` inside the `process_new_sound_upload` method so it gets resolved at runtime. Later you'll want to place that code in it's own `signals.py` file.

sdolan 2010-10-06 06:48:33

@sdolan: That did it. Thanks for your quick replies.

knuckfubuck 2010-10-06 07:15:59

@sdolan: Before I start a new question for this, maybe you can help. Instead of pycurl I want to use http://pyload.org/ to download files to my server but it is over my head on how to implement this. Any ideas?

knuckfubuck 2010-10-07 21:19:10

@knuckfubuck: Why do you want to use pyload over pycurl? It doesn't look like the right tool for this sort of thing.

sdolan 2010-10-07 22:51:56

@sdolan: The links that will be submitted to me will be mostly One-Click Sharing sites and I will need a way to get past the captchas, which pyload does.

knuckfubuck 2010-10-08 00:47:36

@knuckfubuck: I'd be curious to see how reliable it is in breaking captchas. I'd definitely start a new question for this. Though I would highly recommend getting everything working end-to-end with simple downloads before you add in the extra complexity of dealing with captchas and third party sites.

sdolan 2010-10-08 01:13:13

ansaurus

tags:

views:

answers:

How would I go about downloading a file from a submitted link then reuploading to my server for streaming?

related questions