views:

97

answers:

3

We have members-only paid content that is frequently copied and republished without our permission.

We are trying to ‘watermark’ our content by including each customer’s user id in a fake css class, for example <p class='userid_1234'> (except not so obivous, of course :), that would help us track the source of the copying, and then we place that class somewhere in the article body.

The problem is, by including user-specific information into an article, it makes it so that the article content is ineligible for caching because it is now unique to each user.

This bumps the page load time from ~.8ms to ~2.5sec for each article page view.

Does anyone know of any watermarking strategies that can still be used with caching?

Alternatively, what can be done to speed up database access? ( ha, ha, that there’s just a tiny topic i’m sure.. )

We're using the CMS Expression Engine, but I'd like to hear about any strategies. They don't have to be EE-specific.

A: 

You could always cache a version that uses a special string, like #!username!#, and then later fill it in with PHP based on which user is viewing it.

Another way I believe is to switch from caching on the server to instead let the browser cache it locally for a little. That way it is only cached per user, and it reduces the calls to your database. Because an article is pretty static, you could just let the local computer cache it, and pull in comments via javascript.

This last one is probably not one you are really looking for, but I'm gonna come out and say it anyway. You could not treat your users like thieves, and instead treat the thieves as thieves. Go to the person hosting the servers your content is on and send them an email telling them copyrighted premium content is being hosted on their servers without your permission. You can even automate that process.

How to find out what sites are posting your content? Put a link in the body content to your site, and do a Google Search/Blog Search for articles linking to that site. To automate it, use Google Blog Search because it offers RSS feeds. Any one that has a link back to your site could go into a database with a link to the page, someone could look at it, and if it is the entire article, go do a Whois and send them an email.

Chacha102
Google Alerts is handy for tracking content online.
emddudley
Trust me, we don't treat our customers as thieves. We're very liberal in our policies, from refunds to reprints. It's just not fair to neither us the publishers, nor our paying clients when the stuff they paid to get is splashed around the internet for free.
Ian
And we certainly do all the things you describe in terms of getting content pulled down. The problem is when people in other countries do it, it's not always so straight forward. We also get people who swear up and down that they themselves are the original writers of our content. It's truly amazing.
Ian
Well, I gave you two options for the beforehand part, and one option for automating the after part. Not trying to say anything bad about the company, just saying that I think it would be easier to go after the 'actual' thieves, because they already committed the crime. If you already are going after them, then I hope my first two suggestions could help.
Chacha102
Yep, thanks. I'll check into the first suggestion. That might work, but, it may also require retroactively adding the placeholder into many, many previously published articles. Thank you for your suggestion.
Ian
A: 

What makes you think adding css to something is going to stop people from copying it without that CSS? It's more likely that they are just coping the source of the content you are showing them and ignoring all the styling around it. For example, I use tamper data to look at all HTTP requests made by Firefox, if I can see it on the page, I can see it in the logs. Even with all the "protection" some sites try to put in place, they generally will never work. I can grab what I want, without using any screen capture/recording.

If you were serving flv's, for example, I would easily be able to grab the source of that even if you overlayed it with some CSS. I think the best approach would be to get the sites publishing your premium content and ask them to remove it. It's either that or watermark the actual content on the fly while sending it to the browser.

Sam152
its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent.
Ian
+1  A: 

If you're talking about images then you could use PHP to add a watermark to the images.

http://stackoverflow.com/questions/1217820/how-can-i-add-an-image-onto-an-image-in-php-like-a-watermark

its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent. – Ian 12 hours ago

Going by your above comment you are happy with users copying your content, just not without the formatting etc. So what you could do is provide the users an embed type of source code for that particular content just like YouTube does with videos. Into that embed source code you could add your own links back to your site, utilize your own CSS etc.

That way you can still allow the members to use the content but it will always come out the way you intended it with links back to your site.

Thanks

mlevit
Wow, this solves both problems. +1
Sam152
No, we aren't happy with people copying our content either with or without formatting. This is simply a tool to help us track down those who have already copied it.
Ian