views:

230

answers:

2

Let's say we have an image in the Google App Engine and sites are hotlinking it. How can I find the domain names of the sites?

My first thought was:

request.client

and then do a reverse lookup but that it's not possible in GAE and would take a lot of time. I am pretty sure that there is a property that allows me to get the url of the site that is requesting the file (somewhere in request?). GAE has a Request class but I couldn't make it work inside web2py.

Any ideas?

+2  A: 

You can easily get the referrer from the request headers. This referrer can be spoofed, but most people do not spoof it and it is already resolved.

There is no automatic way to resolve the DNS other than manually resolving it. Like you said, a DNS resolution takes extra time and it makes no sense for Web2Py or any other framework to do it.

Unknown
I've been searching for request headers and referer but no info for web2py
Jon Romero
"most people do not spoof it" - and in particular it can be spoofed by the client, but I think not by the server which is direct-linking the image? Unless JavaScript can specify the http-referer of a request. Chances are the client doesn't know or care what's going on.
Steve Jessop
request.env.http_referrer
Unknown
@onebyone: you cannot detect if someone has hotlinked your image until a client requests it from the hotlinker's website.
Unknown
Is it referrer or referer in web2py? The HTTP standard misspells it "referer", and most (but not all) web frameworks follow suit.
Steve Jessop
@onebyone probbaly referer then, or whatever most browsers use.
Unknown
Yes, but what I mean is that you don't have to worry about spoofing unless either (a) the client is in on the crime, or (b) the enemy server can deliver some JavaScript or something to cause the client to specify a false Referer header when it goes to you for the image. I'm just saying that (a) is so incredibly rare not to matter, and hopefully (b) is impossible.
Steve Jessop
request.env.http_referer returns None on Firefox/Linux. But I think that it's the closest I am going to get.
Jon Romero
@onebyone: many people disable their referrer header.
Unknown
Sure, but I was commenting on spoofing.
Steve Jessop
+1  A: 

If you're just looking to find out the domain names (not to block the requests by running a script when the image URL is requested), then they'll be in the request logs. In the admin thingy go to "Logs", select "Requests only" from the drop-down. If you expand "Options" you can filter on the relevant filename.

Then expand each request log entry, and the referer is either a hyphen, or the string in quotes immediately following the 200 (or whatever) status code and the size transferred. Chances are very high that not all of the clients have blocked or spoofed the header, so you'll see the URLs linked from.

You can also download the logs using the SDK, and search/process them locally:

appcfg.py --email=whatever request_logs some_filename
Steve Jessop