views:

179

answers:

5

We wish to make a desktop application that searches a locally packaged text database that will be a few GB in size. We are thinking of using lucene.

So basically the user will search for a few words and the local lucene database will give back a result. However, we want to prevent the user from taking a full text dump of the lucene index as the text database is valuable and proprietary. A web application is not the solution here as the Customer would like for this desktop application to work in areas where the internet is not available.

How do we encrypt lucene's database so that only the client application can access lucene's index and a prying user can't take a full text dump of the index?

One way of doing this, we thought, was if the lucene index could be stored on an encrypted file system within a file (something like truecrypt). So the desktop application would "mount" the file containing the lucene indexes.

And this needs to be cross platform (Linux, Windows)...We would be using Qt or Java to write the desktop application.

Is there an easier/better way to do this?

[This is for a client. Yes, yes, conceptually this is bad thing :-) but this is how they want it. Basically the point is that only the Desktop application should be able to access the lucene index and no one else. Someone pointed that this is essentially DRM. Yeah, it resembles DRM]

+4  A: 

How do we encrypt lucene's database so that only the client application can access lucene's index and a prying user can't take a full text dump of the index?

You don't. The user would have the key and the encrypted data, so they could access everything. You can bury the key in an obfuscated file, but that only adds a slight delay. It certainly will not keep out prying users. You need to rethink.

Matthew Flaschen
Actually the key would come from one of those standard hardware based software protection USB dongles.
Sid NoParrots
@Sidharth: regardless of *where* the key is stored, the user has access to both the encrypted data and the key, so you've already lost. It is for this very reason that copy-protection and DRM don't work. (For a reason why a USB dongle would be useless, see [this](http://usbsnoop.sourceforge.net/))
Dean Harding
@Sidharth: a HW dongle might help (when it works - it mostly has a serial-to-USB kludge inside, oozing all kinds of happiness and joy), but then the whole application must be cryptographically secure, otherwise a cracker will just find the place where the HW code is checked and either patch it with "allow always" or dump the decryption key - and you're back to square 1.
Piskvor
We want to make it tough but not impossible. So yes, you could do a memory dump and get the decryption key out (whether you are using a software key or a Hardware based dongle). Is there a way to make this tough for hackers ... we don't want to make it impossible. The point is the client is distributing a database of thousands of valuable text files. He wants customers to be able to search them offline. Searching is fine but indiscriminate ripping of data is not.
Sid NoParrots
+2  A: 

Technically, there is little you can do. Lucene is written in Java and Java code can always be decompiled or run in a debugger to get the key which you need to store somewhere (probably in the license key which you sell the user).

Your only option is the law (or the contract with the user). The text data is copyrighted, so you can sue the user if they use it in any way that is outside the scope of the license agreement.

Or you can write your own text indexing system.

Or buy a commercial one which meets your needs.

[EDIT] If you want to use an encrypted index, just implement your own FSDirectory. Check the source for SimpleFSDirectory for an example.

Aaron Digulla
Yes, people could do that... but we want to make it tough for people, not necessarily impossible.
Sid NoParrots
I suggest to make it more simple for people to use your application so they have no incentive to get the data out. Spend a lot of money to make your legitimate users happy instead of making them mad with copy protection (which will only drive them to the next cracker for a version of your app without copy protection and without any hazzle).
Aaron Digulla
Its not our application... its for a client. They want it like this. We can't get into a moral debate with them. Too often questions like this get mixed up with philosophy :-). We want to know how to do it and not if its good for world hunger :-)
Sid NoParrots
In that case, see my edit for a solution.
Aaron Digulla
+2  A: 

The problem here is that you're trying to both provide the user with data and deny it from em, at the same time. This is basically the DRM problem under a different name - the attacker (user) is in full control of the application's environment (hardware and OS). No security is possible in such situation, only obfuscation and illusion of security.

While you can make it harder for the user to get to the unencrypted data, you can never prevent it - because that would mean breaking your app. Probably the closest thing is to provide a sealed hardware box, but IMHO that would make it unusable.

Note that making a half-assed illusion of security might be sufficient from a legal standpoint (e.g. DMCA's anti-circumvention clauses) - but that's outside SO's scope.

Piskvor
( for the DRM version, see e.g. this question: http://stackoverflow.com/questions/1790190/is-it-possible-to-protect-from-downloading-a-video-from-a-site/ )
Piskvor
+1  A: 

Why not building an index that contains only the data that user can access and ship that index with the desktop app?

Pascal Dimassimo
A: 

True-crypt sounds like a solid plan to me. You can mount volumes and encrypt them in all sorts of crazy overkill ways, and access them just as any other file.

No, it isn't entirely secure, but it should work well enough.

Adam Shiemke