views:

173

answers:

9

Alice & Bob are both secret quadruple agents who could be working for the US, Russia or China. They want to come up with a scheme that would:

a) if they are both working for the same side, prove this to each other so they can talk freely.

b) if they are working for different sides, not expose any additional information about which side they are on.

Oh, and because of the sensitive nature of what they do, there is no trusted third party who can do the comparison for both of them.

What protocol would be able to satisfy both of these needs?

Ideally, any protocol would also be able to generalize to multiple participants and multiples states but that's not essential.

I've puzzled over it for a while and I can't find a satisfactory solution, mainly owing to condition b.

edit: Here's the original problem that motivated me to look for a solution. "Charlie" had some personal photos that he shared with me and I later discovered that he had also shared them with "Bob". We both wanted to know if we had the same set of photos but, at the same time, if Charlie hadn't shared a certain photo with either of us, he probably had a good reason not to and we didn't want to leak information.

My first thought would be for each of us to concatenate all the photos and provide the MD5 sum. If they matched, then we had the same photos but if they didn't, neither party would know which photos the other had. However, I realized soon after that this scheme would still leak information because Bob could generate an MD5 for each subset of photos he had and if any of them matched my sum, he would know which photos I didn't have. I've yet to find a satisfactory solution to this particular problem but I thought I would generalize it to avoid people focusing on the particulars of my situation.

A: 

Wouldn't RSA work here? Each nation knows its private key, you publish your public key, and only nations that are the same can decrypt the info. I guess the second person would know that the first isn't on the same side as they are, however.

Hmm.

Stefan Kendall
Each spy is a quadruple agent so it knows the private keys to every nation.
Shalmanese
Perhaps the key here is that we don't need to establish anything but a handshake. There has to be merit in the fact that this does not require extensive transmission of data.
Stefan Kendall
If the spies know the private keys they shouldn't know, they aren't private anymore and therefore the problem is impossible.
Kristoffon
We can' rely on private keys, but perhaps the interaction of public information can somehow identify both parties.
Stefan Kendall
A: 

How about Public Key Cryptography?

Ilya Biryukov
+1  A: 

So they are guaranteed to be quadruple agents? That is they are guaranteed to be secretly working for one faction while pretending to work for a second while pretending to work for a third while pretending to work for a fourth? They are limited to just the US, Russia or China? If so then that means that there will always be at least one faction they are both pretending to work for and are simultaneously actually working for. That seems to negate their ability to be quadruple agents, because surely one of them can't be working for the Americans while secretly working for the Americans, while secretly working for the Americans, while secretly working for the Americans.

You say that the ideal solution would generalize to arbitrary numbers of states and spy-stacks. Can the degree of secret agent-ness be either higher, equal or lower than the number of states? This might be important. Also, is Alice always guaranteed to have the same degree of agent-ness as Bob? i.e. They will ALWAYS both be triple agents, or ALWAYS both by quintuple agents? The modulo operator springs to mind...

More details please.

As a potential answer, you can enumerate the states into a bitfield. US=1 Russia=2, China=4, Madagascar=8, Tuva=16 etc. Construct a device that is essentially an AND gate. Alice builds and brings one half and Bob builds and brings the other. Separated by a cloth, they each press the button of the state they're really working for. If the output of the AND gate is high, then they're on the same side. If not, then they quietly take down the cloth, and depart with the respective halves of their machine so that the button can't be determined by fingerprint.

This is not theoretical or rigorous, but practical.

Omniwombat
Assume Alice is loyal to the Americans. America hires her to be a mole in the Russian spy agency which hires her to be a mole to the Chinese who hire her to spy on the Americans. That way, she can stay at home while feeding America information about both China and Russia.
Shalmanese
What's to stop Bob from building a malicious AND gate?
Shalmanese
He doesn't supply the AND gate. He merely brings his enumerator/wires-that-feed-into the AND gate. If he builds it wrong, then he's guaranteed to get the wrong answer and has only himself to blame. Admittedly you did say that there was no trusted third party, and an AND gate as described would count as one. They're so simple that it's trivial to verify that it's working properly.
Omniwombat
A: 

Interesting.

I think, no matter what the scheme, it'll need to involve a component of random failure. This is because of the conflicting requirements. You would need a scheme that, occasionally, even when they are on the same side, doesn't work. Because if it always worked, they would immediately be able to determine they aren't on the same side.

Your point 'B' is also vague. You say you don't want to expose what side they are on. Does that mean that the info can't point to specifically one of the sides? Is it okay if Alice thinks Bob is from either one of the others?

Also, have you tried emailing this to the cryptography mailing list? May get a better response there. It's an interesting one to think about :)

Noon Silk
I mean that if Bob is American, then after the exchange, P(Alice is American) = 0, P(Alice is Chinese) = 0.5, P(Alice is Russian) = 0.5
Shalmanese
I think your link for the cryptography mailing list is stale.
e5
e5: It's got no content, but I think the mailing address is still good? I haven't tried subscribing myself though, as I'm already on it :)
Noon Silk
A: 

Here's the closest I've come to a solution:

Assume there is a function doubleHash such that

doubleHash(a+doubleHash(b)) == doubleHash(b+doubleHash(a))

Alice generates a 62 bit secret and appends the 2 bit country code to the end of it, hashes it and gives Bob doubleHash(a).

Bob does the same thing and gives Alice doubleHash(b).

Alice appends the original secret to the hash that Bob gave her, hashes it and publishes it as doubleHash(a+doubleHash(b)).

Bob does the same thing and publishes doubleHash(b+doubleHash(a)).

If both the hashes match, then they are from the same country. On the other hand, if they don't match, then Bob can't decipher the hash because he doesn't know Alice's secret and vice versa.

However, such a scheme relies on the existence of a doubleHash function and I'm not sure if such a thing is possible.

Shalmanese
the problem here of course is, they all know all the country codes secret codes, and can just emulate which ever of the 4 they want. So they have a 25% chance of making a calculated guess at what the opposition is ( assuming the opposition sends a truthful hash ) and then faking the key, and when the opposition matches you know what they are without disclosing yourself truthfully.
Kent Fredric
2) We assume the spies are more motivated by establishing a successful communication that sussing out the affiliation of the other party. Providing a random guess gives a 1/3rd chance of getting the affiliation right but also a 1/3 chance of missing out on a communication opportunity
Shalmanese
@Shalmanese : If they can be trusted not to attack the system let them announce what side they work for and then forget the information if they aren't working for the same side. Security only matters if they can not be trusted, or Eve is secretly watching/manipulating the communication.
e5
@Shalmanese see problems with doubleHash and an implementation that I added to my question.
e5
+1  A: 

For your photos problem, create hashes for all subsets of your photos; randomly select a subset of these, and shuffle in an agreed quantity of randomly generated hash values. Bob does the same, and you exchange these sets. If the proportion of hashes in what Bob has sent you that matches ones you can generate by hashing subsets of your photos significantly differs from what you expect, it is likely you have a significantly different corpus of photos from him. If the proportion of random hashes you agree on is high, you risk being unable to detect small differences in your collections of photos; if the proportion is low, you risk exposing information about missing photos; you will have to select a suitable point for the tradeoff.

moonshadow
I think what you're showing in this answer is a zero-knowledge proof. That's the theory that you need to implement to figure out the photos problem. http://en.wikipedia.org/wiki/Zero-knowledge_proof
Jeff Tucker
I don't think your solution is correct, the problem states that neither parties learn about what photos the other party has. Your solution leaks photo's which both parties share.Furthermore if charlie is lieing, he can break the entire protocol since he supplies you with all fake md5 check sums. Or some subset of fake md5 check sums.
e5
A: 

The most simple thing I can think of with the photos that would possibly work is as thus:

  1. Hash all the photos with a 4096 bit hash.
  2. Sort the photos by hash value. ( Hashes are afterall, just a string representation of a large number )
  3. using that sort order, use a streaming system to pipe, and hash, those photos, as if they were a singular file.
  4. Share your hashes.
  5. If the hashes match, you have the same files. ( low low risk of incorrect positive match, but at 4K hashing, its a bit unlikely )

There are of course, a few weaknesses here:

  1. Don't share how many photos you have. Doing so could permit the party with the greater number of photos do intelligent permutation of the data and remove photos from the hash set they suspect likely you don't have, using the number as a guide, and find ( at great computational expense mind ) a set of images that matches your hash.
  2. They can do 1 without the number, but its harder, and they're out of luck if they actually have less photos.
  3. They could create a fake hash, simply with a random number generator, and send it to you, giving you the impression you had different datasets when you really had the same.

The above weaknesses are also prevalent in your country code identification system, except of course, you have far less entropy to get in the way, and its far easier to fraud the system. ( and thus, far far far easier to work out who they are by sheer brute force, or have yourself worked out by brute force, regardless of how fancy your hash algorithm is ) If this were not the case, you would have already been found out by the very agencies you work for, because something that reliable and secure would be a sure fire way to do a secure background check.

Kent Fredric
If Bob has a super set of the photographs Charlie has. Bob could just try different combinations of the photographs until he found the subset which Charlie has. For small numbers photographs this is a very small number.
e5
@e5, yes, I quite agree, although I mainly pointed out there is a slight strategic weakness if Bob has some idea what Charlie might be prohibited to see. But without the strategic weakness, there's still a lot of computation required to find the Hashes for each iteration, which will greatly limit the solving power anyone has with a brute-force permutation, unless they have very fast processors, or lots of free time. The country codes however, any modern computer could break in under a second.
Kent Fredric
A: 

The Photo Scenario is Impossible to Achieve:

Your scheme is impossible for the reasons that you name.

Consider a function f, which takes two sets of photos, s1 and s2. f(s1, s2) returns true if s1=s2 and false if s1!=s2. That is, this function implements the scheme you want.

Bob can always supply a subset of photo's he has, and learn which photo's charlie doesn't have. There is no way around this, any function which has the property you want can not have the security you want.

The Spy Scenario is Even More Impossible:

As Kent Fredric pointed out the spy scenario has even greater inherent weaknesses. It has all problems of the photo scenario, plus the additional weakness of having only four secrets.

In the photo scenario it would be highly unlikely that Bob would randomly guess one of Charlies photographs. It is trivial in the spy scenario for Bob to guess Alices choice (1/4). The spys only have four countries they can belong to, as they are both quadruple agents they both know all the secret code words for each country. Thus, Bob could pretend to be working for the Chinese to test Alice.

A Different Type of Solution:

Some posters have noted, the security can be increased if you weaken the accuracy of f. Of course if it is not accurate what is the point. I propose a different type of solution.

  • Do not let them compare the same photographs more than one time.

The party which wishes to initiate the comparison must first show that this is a new comparison and does not use any of the pictures from before.

EDIT: Problems with Double Hash

I am making some assumptions about the doublhash protocol, but...

  1. For the photograph scheme, the doublehash protocol is no better than f, because the 62 bit secret must be constructed from a set of photographs for the comparison to be meaningfull. The subset attack mentioned in the original question still applies here. Try all subsets of photographs to brute force the secrets you can generate, thus Bob can see if he can generate the same secret as Alice.

Using the doublehash property Bob can still brute force the secret.

doubleHash(s1+doubleHash(b)) != doubleHash(aliceSecret+doubleHash(a))

doubleHash(s2+doubleHash(b)) != doubleHash(aliceSecret+doubleHash(a))

doubleHash(s3+doubleHash(b)) == doubleHash(aliceSecret+doubleHash(a))

Bingo, aliceSecret == s3.

DoubleHash is only as strong as it is hard to bruteforce either a or b

Implementating DoubleHash

Instead doubleHash(a + doubleHash(b)), try doubleHash(a, md5(b)). DoubleHash(a + doubleHash(b)) is bad because Bob could generate colliding hashes like so:

doubleHash((12 + doubleHash(34)) + doubleHash(5678))  
= doubleHash((34 + doubleHash(12)) + doubleHash(5678))
= doubleHash(5678 + doubleHash(12 + doubleHash(34)) 
= doubleHash(5678 + doubleHash(34 + doubleHash(12))

Here is an implementation of doubleHash using the new formulation,

Doublehash(a, hashOfB){
   hashOfA = md5(a)
   combinedHash = hashOfA xor hashOfB
   return md5(combinedHash)
}

One could also use the math behind blind signatures to impliment version of doubleHash.

e5
1) The photo scenario is not impossible, although it appears to be on first glance. I've shown a theoretical version that is secure using a doubleHash function that requires Bob to verify each guess Alice submits and vice versa to prevent brute forcing the answer.2) We assume the spies are more motivated by establishing a successful communication that sussing out the affiliation of the other party. Providing a random guess gives a 1/3rd chance of getting the affiliation right but also a 1/3 chance of missing out on a communication opportunity.
Shalmanese
1) Bob can always just hide a subset of the photos from any scheme, if it says he his subset and charlies set are equal, he learns something he shouldn't. If the system works correctly Bob can learn something he shouldn't. There is no way around that.
e5
e5: The 62 bit secret is a random number, not related to anything else and presumably not amenable to brute force (bit length can be increased until brute force is impossible).It's impossible to generate all subsets because for a comparison to be made, Bob must first pass the comparison to Alice to be hashed again. Alice can simply refuse to hash anything that looks like an obvious brute force attack.
Shalmanese
@Shalmanese The 62 bit secret is all well and good for spies, assuming that only spies of the same country have the same secret, but it does not hold up for photographs which since Bob could have the photographs {A,B,C} and Charlie has {A,B}. Bob could always submit {A,B} as his photographs. I agree with you if you limit the number of attempts you greatly reduce the chance of a successful attack.
e5
+1  A: 

For both problems, you could use a Secure two-party computation equality-algorithm. There are many schemes, for example this by Damgard, Fitzi, Kiltz, Nielsen and Toft: Unconditionally Secure Constant Round Multi-Party Computation for Equality, Comparison, Bits and Exponentiation.

Of course an agent could try to pose as an agent from another side to get a 1/3 chance to discover the true side of another agent, but that seems unavoidable.

A much simpler scheme for the photo-problem, which should be almost as good as the secure multiparty computation, is the following:

  1. Alice and Bob sorts their pictures and generate a SHA-512 hash.
  2. Alice sends the first bit of her hash to Bob.
  3. Bob compares the bit to the first bit of his hash. If it is different, they know that they have received different photos. Otherwise they continue.
  4. Bob sends the second bit of his hash to Alice.
  5. Alice checks this bit and decides whether to continue.
  6. Continue until the protocol aborts or all bits have been checked.
Rasmus Faber