views:

277

answers:

4

I'm trying to help create a neighborhood directory and I want to discourage someone from harvesting contact info (especially email addresses) from that.

Is there any easy way to prevent someone from copying and pasting that text from the PDF?

Update Goal here is to make the PDF no easier to harvest email addresses from than the current paper directory, and to make the PDF directory as useful as the paper directory. The online pdf directory will have advantages such as always being up to date and saving some printing costs (or passing those costs on to folks who want to print the document).

+3  A: 

Using an image instead of text makes it a lot more difficult to automatically grab data from a PDF.

Part of one of my previous jobs included reformatting data in PDFs to a (specific) more structured document format, and when we got PDFs whose text was images -- let alone blurry or hard to read images -- the OCR would be riddled with wrong letters, and we'd have to go in by hand and fix most everything.

Mark Rushakoff
It also makes the files much bigger - which is a nuisance.
Jonathan Leffler
@Jonathan Leffler; *everything* comes down to a trade-off between current problems, and potential new problems.
David Thomas
Don't just use images - use vector images specifically (so that they scale, and aren't too large).
Pavel Minaev
A: 

PDF allows for locking the document (source text will be encrypted, but readable), so the properties won't allow reader to print or copy from it.

Anyway, I would discourage this use as it is pain in the ass to use such PDF. Personally, I would recommend you to look for other methods than actively making your document readers angry.

PS: Harvesting emails from PDF is virtually unheard of.

dusoft
+4  A: 

If the data is to be readable, which I'd assume is your goal, there is no way you can stop a dedicated person from taking it and using it. Converting to an image will make it difficult, but anyone with good OCR or a team of cheap foreign labor can get anything they want out of it. If the data is super sensitive and you are worried about it, you should really reconsider the value of publishing it.

CaptnCraig
A: 

The other answers are a good start. However, I found out exactly how to lock the PDF to prevent copying.

You can use Primo PDF's free pdf driver and change the Security settings per: http://www.primopdf.com/help/tip%5Fsecure%5Fpdf.aspx

To add password security to your PDF, read on to learn how you can do it free with PrimoPDF.

  1. Download and install the free PDF driver: http://www.primopdf.com/download.aspx
  2. Open the file to convert to PDF
  3. Open the Print dialog (or press Ctrl+P)
  4. In the printer list, choose PrimoPDF
  5. Click Print
  6. On the PrimoPDF dialog, click the Change button next to the Security label to open the security dialog.
  7. Enter your Open password twice.
  8. Optionally, enter a Permissions password and choose the functionality you want to restrict.
  9. Click OK.
  10. Click Create PDF.

Final Tip. If you want to apply security to all the PDF files you create, you can do it easily by correctly configuring PrimoPDF. At the bottom of the dialog (see above), just make sure the Always use these settings option is turned on.

Clay Nichols
This is a good trick, but locked pdfs annoy me (and others) to no end. Additionally, they can be cracked, OCRed and mined by hand.
CaptnCraig
See my update: the goal is just to make no easier to harvest email from the PDF than it would be from the paper directory and make the PDF directory no less useful than the paper directory. A locked PDF meets both those criteria.
Clay Nichols
Why the down-vote? Because 'locked pdfs are annoying'? If the solution answered the question appropriately to the requirements, and didn't actively hamper or reduce the formation of a solution, I'm not sure I can agree with that. Ah well, +1 to restore some unfairly-lost karma (imho, ymmv, etc...). =]
David Thomas
A locked PDF doesn't meat those criteria, because it is trivially easy to "break" a locked PDF. All it takes is to use a PDF reader which ignores the DRM bits in the PDF.
Pavel Minaev