views:

64

answers:

3

What is the best way to determine if a page on a website is REALLY displaying a specific img tag like this <img src=http://domain.com/img.jpg&gt;? A simple string comparison is easy to fool using http comments <!-- -->. Even if the html tag exists it could be deleted with JavaScript. It could also be obscured by placing an image over it using CSS. Do you know of a solid method of detecting the img tag dispute these obscuring attacks listed? Do you know of another method of obscuring the image? Python code to detect the image would be ideal, but if you know of a good tactic or method that will earn you a +1 from me.

+1  A: 

The only surefire way I can think of is to render the page and check. It is simple to strip comments etc. But if scripts are involved, it is not possible to have a general solution that will not amount to executing them (I believe this is the first time I ever invoked Church's theorem...).

Ofir
I think you are right, I'm looking for something a bit more specific.
Rook
The question is really how much effort you are willing to go to (and resource usage) and how much the attacker is - maybe some more information about your use case?
Ofir
Aah yes, don't you enjoy a game of cat and mouse?
Rook
In any case, although you can't decide the general case - if you decide that anything that doesn't look kosher is forbidden - you can probably define some simple heuristics that would save you the rendering of the page.
Ofir
A: 

You could place a script anywhere that processes the request, counts the view and delivers the image like this:

http://yourhost.com/imageprocess?image=media/foo/bar.jpg

Then you can be sure that the image was loaded. If if was viewed, you of course can't be sure, however.

schneck
Nope. I'm looking at a remote page, not my own.
Rook
+1  A: 

I don't think you can ever be sure. First, you're not even sure the program will stop.
Aside from that, consider the following scenarios. Your <img> can be added, removed or get obscured using JavaScript, CSS and/or server-side:

Google is facing a similar problem - people are hiding search keywords in hidden text and links to get a better rank. Their solution is to penalize sites with hidden text. They get away with it because they're Google; people depend on them for traffic.
As for you, you can't do much better than to ask nicely...

Kobi
Who says you have to be the best? In a game of cat and mouse your just trying to raise the bar. The halting problem is very interesting. An agent can thwart dynamic content by conducting periodic observations, a computer is far more observant than a human and my goal is it to make sure that humans can see the image. Cat and Mouse.
Rook