tags:

views:

489

answers:

4

I would like to XOR a very big file (~50 Go).

More precisely, I would like to do so by XORing each block of 32 bytes of a plaintext file (because of lack of memory) with the key 3847611839 and create (block after block) a new cipher file.

Thank You for any help!!

+1  A: 

You need to craft a solution around a streaming architecture: you read the input file in "stream", modify it, and write the result in the output file.

This way, you don't have to read all the file at once.

jldupont
+1  A: 

If your question is how to do it without using extra space on the disk, I would just read in the chunks in multiples of 32 bytes (as big as you can), work with the chunk in memory, then write it out again. You should be able to use the ftell and fseek functions to do that (assuming your long type is large enough, of course).

It may be faster to memory-map the file if you can spare that much out of your address space (and your OS supports it) but I'd try the easiest solution first.

Of course, if space isn't a problem, just read the chunks in and write them to a new file, something like the following (pseudo-code):

open infile
open outfile
while not end of infile:
    read chunk from file
    change chunk
    write chunk to outfile
close outfile
close infile

This sort of read/process/write is pretty basic stuff. If you have more complicated requirements, you should update your question with them.

paxdiablo
+2  A: 

This sounded like fun, and doesn't sound like a homework assignment.

I don't have a previously xor-encrypted file to try with,but if you convert one back and forward, there's no diff.

That I tried atleast. Enjoy! :) This xor's every 4 bytes with 0xE555E5BF, I presume that's what you wanted.

Here's bloxor.c

// bloxor.c - by Peter Boström 2009, public domain, use as you see fit. :)

#include <stdio.h>

unsigned int xormask = 0xE555E5BF; //3847611839 in hex.

int main(int argc, char *argv[])
{
    printf("%x\n", xormask);
    if(argc < 3)
    {
     printf("usage: bloxor 'file' 'outfile'\n");
     return -1;
    }

    FILE *in = fopen(argv[1], "rb");
    if(in == NULL)
    {
     printf("Cannot open: %s", argv[2]);
     return -1;
    }

    FILE *out = fopen(argv[2], "wb");

    if(out == NULL)
    {
     fclose(in);
     printf("unable to open '%s' for writing.",argv[2]);
     return -1;
    }
    char buffer[1024]; //presuming 1024 is a good block size, I dunno...

    int count;

    while(count = fread(buffer, 1, 1024, in))
    {
     int i;
     int end = count/4;
     if(count % 4)
      ++end;

     for(i = 0;i < end; ++i)
     {
      ((unsigned int *)buffer)[i] ^= xormask;
     }
     if(fwrite(buffer, 1, count, out) != count)
     {
      fclose(in);
      fclose(out);

      printf("cannot write, disk full?\n");

      return -1;
     }
    }

    fclose(in);
    fclose(out);

    return 0;
}
pbos
What happens to the last bytes in the file when its length is not a multiple of 4? :)
pmg
Wow... Your code is perfect. Many thanks! But I don't understand this line: for(i = 0;i < count/4; ++i); I don't see why we need to do this: count/4
Doug
pmg: Shit, that's right, I'll edit change that.
pbos
Doug: Each block to be xor'd is 4 bytes, that means that the 1024-byte buffer has 256 blocks, and if we read n bytes that translates to ciel(n/4) blocks to xor. (n/4 rounded up, I'll edit to change that off-by-one error.)
pbos
@pbos, you didn't return when the output file couldn't be opened. I assumed that was an oversight (and fixed it).
paxdiablo
Hehe,oops. Good catch. :)
pbos
+3  A: 

As starblue mentioned in a comment, "Be aware that this is at best obfuscation, not encryption". And it's probably not even obfuscation.

One property of XOR is that (Y xor 0) == Y. What this means for your algorithm is that for anyplace in your very big file where there are runs of zeros (which seems pretty likely given the size of the file), your key will show up in the cipher file. Plain as day.

Another nice feature of XOR encrypted stuff is that if someone has both the plaintext and the cipher text, XOR'ing those items together nets you an output that has the key used to perform the cipher repeated over and over. If the person knows that the 2 files are a plaintext/ciphertext pair, they've learned the key which is bad if the key is used for more than one encryption. if the attacker isn't sure if the plaintext and ciphertext are related, they have a pretty good idea after this since the key is a repeated pattern in the output. None of this is a problem with one time pad because each bit of the key is used only once, so one one learns anything new from this attack.

A lot of people make the mistake of assuming that because a one time pad is provably unbreakable, that an XOR encryption might be OK 'if done well' since the fundamental operation performed is the same. The difference is that a one time pad uses each random bit of the key exactly once. So among other things, if the plaintext has a run of zeros, nothing is learned about the key, unlike with a simple fixed-key XOR cipher.

As Bruce Schneier said: "There are two kinds of cryptography in this world: cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files."

An XOR cipher is barely kid sister proof - if even that.

Michael Burr
Hmm, I agree with you. If a file is XORed once with a secret key of the same size, it's secure and unbreakable. But is my case it is more in order to stop my kid sister... Thanks for your help!!
Doug
Also, if an attacker could guess 64 consecutive plaintext bits in your 50 gigabyte file, they'd have your key. :) Knowing the file format could be enough for that, as they often have some kind of header.
pbos