views:

259

answers:

13

I understand that no matter what I do, someone will be able to copy it. However I can still make them work hard for it. What are some good ways of making data not easily copied using php compatible coding.

--- Added ----

The data is a listing of results for certain local sports events. We send people out to collect the information, post the information, make corrections and such. However a competing website takes our results (I know they are directly copying them) and never updates them which causes people to call our office and complain.

---- Answer for my Use ----

I picked one of them, however I am going to use multiple of your answers. I am going to add my link in a using the copy pasta trick. I am going to put fake hidden text into it. I am also going to do the fake hidden text trick with different versions of the div tag that are fake (making it even harder to scrape or to do something like copy to textpad and replace it real easily), and I am going to talk to a lawyer as well about legal recourse and what I can do to make it illegal for them to copy the data (such as creative bios or something cool like that). Thanks for your help.

+1  A: 

Programs used to copy out data look for the data using pattern-matching. You could 'decorate' your data with randomly-chosen tags (like one row would have a span tag surrounding it, the next row a div, etc...). Just a thought.

Clarification: With screen-scraper at least, the user of the program specifies what HTML comes before the data they want, and what HTML comes after it. You can make it more difficult for them to automatically retrieve the data.

Jon
No offense, but do you consider this an answer?
Henrik P. Hessel
Yes, it would break the pattern matching if they are using an automated program like the one from screen-scraper.com. They could still copy out the information by hand of course.
Jon
To thwart poorly-written screenscrapers, this is not a completely bad idea. However, if the browser displays the text properly, any good HTML parser like HTML Agility Pack will have no trouble with it.
Charlie Salts
A: 

Disable the context menu is a start.

$(document).bind('contextmenu', function(e)
{
    return false;
});

Or

<body oncontextmenu="return false;">
ChaosPandion
And then they disable javascript?
Jonathan Sampson
Well this will stop the non dedicated poachers.
ChaosPandion
No, I don't think it will.
Jonathan Sampson
I will add that. But of course that doesn't stop someone using firefox where you can do it through the browser itself.
Joe
All media is downloaded into cache anyway. You don't need to right-click to get pictures, etc.
Jonathan Sampson
I know... but this cuts out a massive class of people from taking your content.
ChaosPandion
NEVER DO THIS! It is not effective and does NOT help ANYONE. Any thief would know to use menu if right-click fails.Disabling right-click will only aggravate your users.
sirlancelot
Imagine placing a 100 dollar bill on a table in a corner of a crowded hall. Many people would have trouble ignoring it. Now try placing an old newspaper on top of it. How many people will pickup the newspaper to find the 100 dollar bill?
ChaosPandion
True, I think they are copying and pasting anyway and they could just use control-c.
Joe
+6  A: 

Joe, you can't really make them work really hard to get your data. It's essentially just a single request to any of your pages. Your best option is to explicitly state that you own the rights to all of your content, and that any infringement on that ownership will lead to legal ramifications*.

* Not a lawyer

Jonathan Sampson
So true, lawyers are scary
Zoidberg
The problem is I don't think I would have legal rights to the data. The already has a statement as such on it to scare off as many people as possible.
Joe
Joe, if you don't have legal rights, you shouldn't worry about others stealing it.
Jonathan Sampson
Let me rephrase that, I cannot copyright the data that I create from what I read. So I could threaten legal action, but a little research and they would know I was just postering
Joe
Even if you make the listing an image, I mean text in an image file, they can copy by hand, and the cost will be to high for your server and bandwidth. If people can see the text it can be copied with almost no effort.
rodrigoq
Jon, please see above comment as well as update to question with additional information. The issue isn't just a simple he has my information on his website. It actually increases the work load of the company quite a bit.
Joe
Does the copy-website actually list your name/number with the data?
Jonathan Sampson
rodrigoq, I am going to explore that. What I was thinking about was changing certain letters into small images. So when they copy and paste it, the data would mess up. I am not working against real intelligent people. They aren't going to be able to do anything automated. Unless they pay someone.
Joe
Jonathan, actually they do. IT is a list of people in the event and the top line is the name of the event. The next line is my companies name. My phone number isn't listed. But easily found.
Joe
So, can you get them into legal trouble for using your company's name? It sounds like people are confusing that site for something you folks are doing, and that sounds like a trademark issue to me. If it is, you might want to jump on it sooner rather than later; trademarks (unlike, say, patents or copyrights) do have to be legally defended to stay valid.
David Thornley
A: 

Forbidding people to get data is almost impossible. You can mess up your tags and make the code really dirty and hard to parse... but it's not really enough. You could also generate a big image with the data in it, this would be painful to parse! ... but you don't want to do that.

Because you said...

However a competing website takes our results (I know they are directly copying them) and never updates them which causes people to call our office and complain.

... my call would be to take this the other way and create an API allowing people to get your content in a way that YOU designed.

Also if they are just shamelessly stealing your data and they don't have the right to do it, consider a legal option.

marcgg
+1  A: 

Place some <div style="display: inline; position: absolute; overflow: hidden; width: 0px">useless words</div> in the text. It won't display for reading, but if someone copy and paste... "WOW where it came from WTF!! *CRY*"

Havenard
That is a pretty clever idea. Will it show on anyone's stuff. I could add in random crazy characters and just mess the crap up. They could of course get it from the source code and remove this. But they are really strictly coping and pasting it.
Joe
I tried what you said, it just shows like regular. Is it missing something?
Joe
<div style="display: inline; visibility: hidden; overflow:hidden; width: 0px; position : absolute; left : -1000px;">asdfasd</div>hereThat is much better. That would make it a pain in the butt for sure. I will just put random stuff into it. If he figures that out. I will just obscure it more and more.
Joe
I did it once and worked, I didn't test this one, I'll be doing it right now to see what detail I've forgot.
Havenard
Is the person who is copying the data then displaying it as HTML? If so, I'm afraid this wouldn't work. It's a great idea though.
Jon
Search engines don't really like hidden text
code_burgar
Fixed. ***************
Havenard
@Jon: You can put the thing in a CSS class, so it wont be that easy to copy.
Havenard
You have to use a CSS style, otherwise it won't even show up on the thief's page because you're copying the inline style with it.
sirlancelot
class="joes_copyrighted_content"
Havenard
Anything you try that is "tricky" should probably be tried in a bunch of different browsers to make sure it doesn't backfire on you.
gbarry
I did, it works perfectly in the 6 most popular browsers I've installed.
Havenard
I am not going to put anything naughty in it. Just make the results look obviously incorrect. I am going to put in a whole bunch of fake text into the results. Random letters in the middle of names, random numbers, extra spaces, whatever I can think of. Text is html, but if I hide the css in a separate page he would have more issues for sure.
Joe
A: 

This isn't an answer to your question as much as it is a comment on your problem.

Why not charge users for access to the data, and provide an API through which they can access the data programatically?

Charlie Salts
I cannot charge the end user anything. I doubt that the person that is copying the results would ever pay.
Joe
So you really can't control who uses this data, or how. Unless you have legal recourse, there's not much you can do beyond obfuscating your markup.
Charlie Salts
Exactly, I have to provide the information as part of my real service. I would even be cool if they linked to my pages. I am not cool with what they are doing now. In the future, I may also want to completely not be able to copy and paste it on their site at a..
Joe
+4  A: 

Your data will be copied to every computer that requests the page and it will stay there until the person clears their cache. To answer your question, you can't.

What you can do is create a CSS style such as:

.copy-pasta { display: none; }

And then throughout your content, add something like this:

<p class="copy-pasta">Content provided via <a href="[your url]">[your website here]</a></p>

This will increase your page rank when copy-pasters blatantly steal your content, meaning you will show up first in search results.

sirlancelot
Wow, that is a pretty smart idea. I would have never thought of that. I am going to do that for sure.
Joe
+1  A: 

How about putting links to your site in with the displayed data? No big fanfare, but just suggest that the for the most up to date figures, they can go to the real website that publishes them.

Most of what you try will only work for a time. Until you exceed their laziness factor. (What they're doing suggests a high laziness factor.)

Laws don't protect publicly available data, but you may be able to protect the packaging and presentation.

gbarry
Thanks, I think that is what I am going to do.
Joe
A: 

Another option is to use PHP code to generate images from the site's HTML. You would use the images to display the content, instead of HTML which can be easily copied out. Example code is here, and I bet you could find more code to do this by Googling:

http://www.acasystems.com/en/web-thumb-activex/faq-php-convert-html-to-image.htm

Jon
Thanks, I will check out that class. I actually wrote some code using the GD library to do the same thing for this. I just want to make it a real pain in the ass for them to do it. I don't think I will be able to stop them completely, but I can make it a real difficult thing to do.
Joe
A: 

Try Copyscape it wont prevent your content from being copied, but it will make finding the copies very easy.

code_burgar
+1  A: 

Why are people calling your office to complain if the data is on a competing website? If they have a domain name that is similar enough to yours that people are confusing the two of you or if they've put something on their site that makes it look like you've endorsed them, then you've got them for trademark infringement.

Jeff Hornby
Because they are results to events that we put on. So competitors want to display these results. Often things are changed like people's ages. We change them, but they don't. So people call and complain to us that they havne't changed them.
Joe
@Joe- Then why don't you provide an API that they can use, chances are with an API they might update the results when they change. It sounds like the issue is their version of events is out of date, not the fact they are copying it.
RichardOD
A: 

You may encrypt the data on the page, and have javascript obfuscated decoding routine that will decode it for you viewers. You may switch keys and encryption algorithms from time to time. Same javascript should disable ability to select text and/or copy it to prevent manual copy-pasting.

They won't be able to copy manually and their scraper would have to be able to run javascript to get the data.

Caveat is that the data won't be visible for Google, but if data is rather numeric it might not be such a big harm.

If they scrape automatically and very often you may also try to pinpoint their IP by observing most active IP-s on your site and serve them fake data.

Please don't use lawyers, that's hitting below the belt.

Kamil Szot
They won't be scraping it. Just using copy paste I am pretty sure. Lawyers wouldn't help much at all. I am thinking of alternative ways of making it almost impossible to copy. These guys aren't programmers. Just want the content.
Joe
You can still benefit from encoding and using js to decode data. They won't be able to copy-paste from source, and they will not be able to switch off javascript to disable your js copy protection methods (such as disabling selection and deselecting text every 10ms) because if they do they will only see encrypted data.
Kamil Szot
A: 

use swf to display your data, just like other online books

solie