tags:

views:

32

answers:

1

Hi,

I have a tricky Django problem which didn't occur to me when I was developing it. My Django application allows a user to sign up and store his login credentials for a sites. The Django application basically allows the user to search this other site (by scraping content off it) and returns the result to the user. For each query, it does a couple of queries of the other site. This seemed to work fine but sometimes, the other site slaps me with a CAPTCHA. I've written the code to get the CAPTCHA image and I need to return this to the user so he can type it in but I don't know how.

My search request (the query, the username and the password) in my Django application gets passed to a view which in turn calls the backend that does the scraping/search. When a CAPTCHA is detected, I'd like to raise a client side event or something on those lines and display the CAPTCHA to the user and wait for the user's input so that I can resume my search. I would somehow need to persist my backend object between calls. I've tried pickling it but it doesn't work because I get the Can't pickle 'lock' object error. I don't know to implement this though. Any help/ideas?

Thanks a ton.

+2  A: 

Something else to remember: You need to maintain a browser session with the remote site so that site knows which CAPTCHA you're trying to solve. Lots of webclients allow you to store your cookies and I'd suggest you dump them in the Django Session of the user you're doing the screen scraping for. Then load them back up when you submit the CAPTCHA.

Here's how I see the full turn of events:

  1. User places search request
  2. Query remote site
  3. If not CAPTCHA, GOTO #10
  4. Save remote cookies in local session
  5. Download image captcha (perhaps to session too?)
  6. Present CAPTCHA to your user and a form
  7. User Submits CAPTCHA
  8. You load up cookies from #4 and submit the form as a POST
  9. GOTO #3
  10. Process the data off the page, present to user, high-five yourself.
Oli
Hi Oli, do you think you could point me to a link explaining how to store and retrieve data in the session?
Mridang Agarwalla
http://www.voidspace.org.uk/python/articles/cookielib.shtml that shows how to use cookielib with urllib2 - they store it to file though. That's fine if there's only one user at a time going through this but I suggest you just chuck it in session so you can have as many users going at it at once. Sessions are simple: http://www.djangobook.com/en/2.0/chapter14/
Oli