views:

2186

answers:

10

I wrote a short C++ program to do XOR encryption on a file, which I may use for some personal files (if it gets cracked it's no big deal - I'm just protecting against casual viewers). Basically, I take an ASCII password and repeatedly XOR the password with the data in the file.

Now I'm curious, though: if someone wanted to crack this, how would they go about it? Would it take a long time? Does it depend on the length of the password (i.e., what's the big-O)?

+49  A: 

The problem with XOR encryption is that for long runs of the same characters, it is very easy to see the password. Such long runs are most commonly spaces in text files. Say your password is 8 chars, and the text file has 16 spaces in some line (for example, in the middle of ASCII-graphics table). If you just XOR that with your password, you'll see that output will have repeating sequences of characters. The attacker would just look for any such, try to guess the character in the original file (space would be the first candidate to try), and derive the length of the password from length of repeating groups.

Binary files can be even worse as they often contain repeating sequences of 0x00 bytes. Obviously, XORing with those is no-op, so your password will be visible in plain text in the output! An example of a very common binary format that has long sequences of nulls is .doc.

Pavel Minaev
Note that it is nearly trivial to XOR a whole file with a space character, at which point any sensible strings leap out as a likely password. For a binary file XORed with an ASCII string, any strings in the result *are* the password. The strings command at a shell prompt will find them.
RBerteig
FWIW you should be clear that you're talking about an XOR scheme when the "key" is less than the plaintext. If the key is the same size as the plaintext, and "truly" random (at least from the POV of the attacker) it's a OTP; aka unbreakable.
Noon Silk
+26  A: 

I concur with Pavel Minaev's explanation of XOR's weaknesses. For those who are interested, here's a basic overview of the standard algorithm used to break the trivial XOR encryption in a few minutes:

  1. Determine how long the key is. This is done by XORing the encrypted data with itself shifted various numbers of places, and examining how many bytes are the same.

  2. If the bytes that are equal are greater than a certain percentage (6% accoridng to Bruce Schneier's Applied Cryptography second edition), then you have shifted the data by a multiple of the keylength. By finding the smallest amount of shifting that results in a large amount of equal bytes, you find the keylength.

  3. Shift the cipher text by the keylength, and XOR against itself. This removes the key and leaves you with the plaintext XORed with the plaintext shifted the length of the key. There should be enough plaintext to determine the message content.

Read more at Encryption Matters, Part 1

GeneQ
+1  A: 

The target of a good encryption is to make it mathematically difficult to decrypt without the key.
This includes the desire to protect the key itself.
The XOR technique is basically a very simple cipher easily broken as described here.

It is important to note that XOR is used within cryptographic algorithms.
These algorithms work on the introduction of mathematical difficulty around it.

nik
+9  A: 

XOR encryption can be reasonably* strong if the following conditions are met:

  • The plain text and the password are about the same length.
  • The password is not reused for encrypting more than one message.
  • The password cannot be guessed, IE by dictionary or other mathematical means. In practice this means the bits are randomized.

*Reasonably strong meaning it cannot be broken by trivial, mathematical means, as in GeneQ's post. It is still no stronger than your password.

TokenMacGuy
This would be a one-time pad: http://en.wikipedia.org/wiki/One-time_pad
ChrisW
For a one-time pad, the key must be the same length as the plaintext. When this is true, and the key is never reused, the one-time pad is absolutely secure.If you're at all interested in the history of cryptography, I highly recommend The Codebreakers, by David Kahn: http://www.amazon.ca/Codebreakers-Comprehensive-History-Communication-Internet/dp/0684831309
Dale Hagglund
*"reasonably secure"* - One time pads cannot be broken by any means, ever (assuming the pad is completely random). They are the *only* absolutely secure method of encryption.
BlueRaja - Danny Pflughoeft
+1  A: 

Norton's Anti-virus used to use a technique of using the previous unencrypted letter as the key for next letter. That took me an extra half-hour to figure out, if I recall correctly.

If you just want to stop the casual viewer, it's good enough; I've used to hide strings within executables. It won't stand up 10 minutes to anyone who actually tries, however.

That all said, these days there are much better encryption methods readily available, so why not avail yourself of something better. If you are trying to just hide from the "casual" user, even something like gzip would do that job better.

Chris Arguin
+5  A: 

In addition to the points already mentioned, XOR encryption is completely vulnerable to known-plaintext attacks:

cryptotext = plaintext XOR key
key = cryptotext XOR plaintext = plaintext XOR key XOR plaintext

where XORring the plaintexts cancel each other out, leaving just the key.

Not being vulnerable to known-plaintext attacks is a required but not sufficient property for any "secure" encryption method where the same key is used for more than one plaintext block (i.e. a one-time pad is still secure).

laalto
+1 it is worth mentioning that if even a small portion of the file is known (such as the headers used by most file-formats), the key can be easily obtained and the entire file decrypted.
BlueRaja - Danny Pflughoeft
+1  A: 

I'm just protecting against casual viewers

As long as this assumption holds, your encryption scheme is ok. People who think that Internet Explorer is "teh internets" are not capable of breaking it.

If not, just use some crypto library. There are already many good algorithms like Blowfish or AES for symmetric crypto.

abababa22
A: 

RC4 is essentially XOR encryption! As are many stream ciphers - the key is the key (no pun intended!) you must NEVER reuse the key. EVER!

Michael Howard-MSFT
A: 

I'm a little late in answering, but since no one has mentioned it yet: this is called a Vigenère cipher.

Wikipedia gives a number of cryptanalysis attacks to break it; even simpler, though, since most file-formats have a fixed header, would be to XOR the plaintext-header with the encrypted-header, giving you the key.

BlueRaja - Danny Pflughoeft
A: 

That ">6%" GeneQ mentions is the index of coincidence for English telegraph text - 26 letters, with punctuation and numerals spelled out. The actual value for long texts is 0.0665.

The <4% is the index of coincidence for random text in a 26-character alphabet, which is 1/26, or 0.385.

If you're using a different language or a different alphabet, the specific values will different. If you're using the ASCII character set, Unicode, or binary bytes, the specific values will be very different. But the difference between the IC of plaintext and random text will usually be present. (Compressed binaries may have ICs very close to that of random, and any file encrypted with any modern computer cipher will have an IC that is exactly that of random text.)

Once you've XORed the text against itself, what you have left is equivalent to an autokey cipher. Wikipedia has a good example of breaking such a cipher

http://en.wikipedia.org/wiki/Autokey_cipher

Jeff Dege