views:

318

answers:

7

A product I'm helping to develop will basically work like this:

  • A Web publisher creates a new page on their site that includes a <script> from our server.
  • When a visitor reaches that new page, that <script> gathers the text content of the page and sends it to our server via a POST request (cross-domain, using a <form> inside of an <iframe>).
  • Our server processes the text content and returns a response (via JSONP) that includes an HTML fragment listing links to related content around the Web. This response is cached and served to subsequent visitors until we receive another POST request with text content from the same URL, at which point we regenerate a "fresh" response. These POSTs only happen when our cached TTL expires, at which point the server signifies that and prompts the <script> on the page to gather and POST the text content again.

The problem is that this system seems inherently insecure. In theory, anyone could spoof the HTTP POST request (including the referer header, so we couldn't just check for that) that sends a page's content to our server. This could include any text content, which we would then use to generate the related content links for that page.

The primary difficulty in making this secure is that our JavaScript is publicly visible. We can't use any kind of private key or other cryptic identifier or pattern because that won't be secret.

Ideally, we need a method that somehow verifies that a POST request corresponding to a particular Web page is authentic. We can't just scrape the Web page and compare the content with what's been POSTed, since the purpose of having JavaScript submit the content is that it may be behind a login system.

Any ideas? I hope I've explained the problem well enough. Thanks in advance for any suggestions.

A: 

If you can add server-side code to the site pushing data to your site, you could use a MAC to at least prevent non-logged in users from sending anything.

If just anyone is allowed to use the page, then I can't think of a waterproof way of confirming the data without scraping the webpage. You can make sending arbitrary content somewhat more difficult with referer checks and whatnot, but not 100% impossible.

Matti Virkkunen
Thanks very much for your input, Matti. Unfortunately I can't add any server-side code to the site that POSTs the text content. My sense, which seems to match yours, is that securing this is technically impossible, but I'm hoping to find some way that at least makes an attack extraordinarily difficult and not worth the trouble.
Bungle
In this cased a MAC is trivial to spoof. How are you supposed to keep the secret from an attacker?
Rook
Using a shared secret or public/private key between the server and the service. Obviously this doesn't prevent logged in users from abusing or leaking it, only unauthenticated people.
Matti Virkkunen
A: 

How about:

Site A creates a nonce (basically a random string), sends it to your site B that puts it into the session. Then when the site A makes the POST request from the site it sends the nonce along with the request and the request is only accepted if the nonce matches the one in the site B's session.

Kai Sellgren
Thanks, Kai, but I don't believe that solves the problem - it only makes the system more of a pain to hack (which still has some value). A malicious party could still forge the requests required here. The fundamental problem is that client-side code is not hidden, so no secrets can be kept. Once you figure out how the browser is communicating with the server, you can emulate the same, and the server is none the wiser.
Bungle
A: 

Give people keys on a per-domain basis.

Make people include in the requests the hash the value of the [key string + request parameters]. (The hash value should be computed on the server)

When they send you the request, you, knowing the parameters and the key, can verify the validity.

glebm
The key string is in JavaScript, though, so anyone can see it and use it to formulate a bogus POST request.
Bungle
You are right. Now I feel stupid. I fixed it I think :)
glebm
A: 

You could have hashed keys specific to each clients IP address and compare that value on the server for each post using the IP in the post header. The up side to this is if someone spoofs their IP the response will still be sent to the spoofed IP and not the attacker's. You might already know this but i'd also suggest adding salt to your hashes.

With a spoofed IP a proper TCP handshake can't take place so the attackers spoofed post isn't completed.

There could be other security concerns i'm not aware of but i think it might be an option

used2could
A: 

Can the web publisher also put a Proxy page on their server?

Then load the script through the proxy. Then you have a number of possibilities where you can control the connection between the two servers, add encryption and things like that.

What is the login system? What about using a SSO solution and keeping your scripts separate?

Ruz
+1  A: 

The primary weakness with the system as you described it is that you are "given" the page content, why not go and get the page content for yourself?

  1. A Web publisher creates a new page on their site that includes a script from your server.
  2. When a visitor reaches that new page, that script sends a get request to your server.
  3. Your server goes and gets the content of the page (possibly by using the referrer header to determine the source of the request).
  4. Your server processes the text content and returns a response (via JSONP) that includes an HTML fragment listing links to related content around the Web. This response is cached and served to subsequent visitors from a server side cache / proxy
  5. When the TTL for the cached version expires, the proxy will forward the request on to your app and the whole cycle starts again from step 3.

This stops malicious content from being "fed" to your server and allows you to provide some form of API key that ties requests and domains or pages together ( i.e. api key 123 only works for referrers on mydomain.com - anything else is obviously spoofed ). Due to the caching / proxy your app is protected to some degree from any form of DOS type attack as well because the page content is only processed once every time the cache TTL expires ( and now you can handle increasing loads by extending the TTL until you can bring additional processing capability on). Now your client side script is insanely small and simple - no more scraping content and posting it - just send an ajax request and maybe populate a couple of parameters ( api key / page ).

Neal
A: 

You could scrape the site, and if you get a code 200 response including your script just use that scrape. If not you may resolve to information from your "client proxy", that way the problem is down to the sites that you can't scrape.

For raising the security in these cases you could have multiple users sending the page and filter out any information that is not present on a minimum number of the responses. That will also have the added benefit of filtering out any user specific content. Also make sure to register what user you ask to do the proxy work and verify that you only receive pages from users that you have asked to do the job. You could also try to make sure that very active users don't get a higher chance of doing the job, that will make it harder to "fish" for the job.

eBusiness