A product I'm helping to develop will basically work like this:
- A Web publisher creates a new page on their site that includes a
<script>
from our server. - When a visitor reaches that new page, that
<script>
gathers the text content of the page and sends it to our server via a POST request (cross-domain, using a<form>
inside of an<iframe>
). - Our server processes the text content and returns a response (via JSONP) that includes an HTML fragment listing links to related content around the Web. This response is cached and served to subsequent visitors until we receive another POST request with text content from the same URL, at which point we regenerate a "fresh" response. These POSTs only happen when our cached TTL expires, at which point the server signifies that and prompts the
<script>
on the page to gather and POST the text content again.
The problem is that this system seems inherently insecure. In theory, anyone could spoof the HTTP POST request (including the referer header, so we couldn't just check for that) that sends a page's content to our server. This could include any text content, which we would then use to generate the related content links for that page.
The primary difficulty in making this secure is that our JavaScript is publicly visible. We can't use any kind of private key or other cryptic identifier or pattern because that won't be secret.
Ideally, we need a method that somehow verifies that a POST request corresponding to a particular Web page is authentic. We can't just scrape the Web page and compare the content with what's been POSTed, since the purpose of having JavaScript submit the content is that it may be behind a login system.
Any ideas? I hope I've explained the problem well enough. Thanks in advance for any suggestions.