views:

103

answers:

3

I am now doing some tests of my application again corrupted files. But I found it is hard to find test files.

So I'm wondering whether there are some existing tools, which can write random/garbage bytes into a file of some format.

Basically, I need this tool to:

  1. It writes random garbage bytes into the file.
  2. It does not need to know the format of the file, just writing random bytes are OK for me.
  3. It is best to write at random positions of the target file.
  4. Batch processing is also a bonus.

Thanks.

+16  A: 

The /dev/urandom pseudo-device, along with dd, can do this for you:

dd if=/dev/urandom of=newfile bs=1m count=10

This will create a file newfile of size 10M.

The /dev/random device will often block if there is not sufficient randomness built up, urandom will not block. If you're using the randomness for crypto-grade stuff, you can steer clear of urandom. For anything else, it should be sufficient and most likely faster.

If you want to corrupt just bits of your file (not the whole file), you can simply use the C-style random functions. Just use rnd() to figure out an offset and length n, then use it n times to grab random bytes to overwrite your file with.


The following Perl script shows how this can be done (without having to worry about compiling C code):

use strict;
use warnings;

sub corrupt ($$$$) {
    # Get parameters, names should be self-explanatory.

    my $filespec = shift;
    my $mincount = shift;
    my $maxcount = shift;
    my $charset = shift;

    # Work out position and size of corruption.

    my @fstat = stat ($filespec);
    my $size = $fstat[7];
    my $count = $mincount + int (rand ($maxcount + 1 - $mincount));
    my $pos = 0;
    if ($count >= $size) {
        $count = $size;
    } else {
        $pos = int (rand ($size - $count));
    }

    # Output for debugging purposes.

    my $last = $pos + $count - 1;
    print "'$filespec', $size bytes, corrupting $pos through $last\n";

 

    # Open file, seek to position, corrupt and close.

    open (my $fh, "+<$filespec") || die "Can't open $filespec: $!";
    seek ($fh, $pos, 0);
    while ($count-- > 0) {
        my $newval = substr ($charset, int (rand (length ($charset) + 1)), 1);
        print $fh $newval;
    }
    close ($fh);
}

# Test harness.

system ("echo =========="); #DEBUG
system ("cp base-testfile testfile"); #DEBUG
system ("cat testfile"); #DEBUG
system ("echo =========="); #DEBUG

corrupt ("testfile", 8, 16, "ABCDEFGHIJKLMNOPQRSTUVWXYZ   ");

system ("echo =========="); #DEBUG
system ("cat testfile"); #DEBUG
system ("echo =========="); #DEBUG

It consists of the corrupt function that you call with a file name, minimum and maximum corruption size and a character set to draw the corruption from. The bit at the bottom is just unit testing code. Below is some sample output where you can see that a section of the file has been corrupted:

==========
this is a file with nothing in it except for lowercase
letters (and spaces and punctuation and newlines).
that will make it easy to detect corruptions from the
test program since the character range there is from
uppercase a through z.
i have to make it big enough so that the random stuff
will work nicely, which is why i am waffling on a bit.
==========
'testfile', 344 bytes, corrupting 122 through 135
==========
this is a file with nothing in it except for lowercase
letters (and spaces and punctuation and newlines).
that will make iFHCGZF VJ GZDYct corruptions from the
test program since the character range there is from
uppercase a through z.
i have to make it big enough so that the random stuff
will work nicely, which is why i am waffling on a bit.
==========

It's tested at a basic level but you may find there are edge error cases which need to be taken care of. Do with it what you will.

paxdiablo
+1 Correct - though you might want to explain the difference between random and urandom...
Konerak
+2  A: 

You could read from /dev/random:

# generate a 50MB file named `random.stuff` filled with random stuff ...
dd if=/dev/random of=random.stuff bs=1000000 count=50

You can specify the size also in a human readable way:

# generate just 2MB ...
dd if=/dev/random of=random.stuff bs=1M count=2
The MYYN
I think this is the best answer because the question is tagged "linux". On another Unix though, /dev/urandom may be documented so: "/dev/urandom is a compatibility nod to Linux. On Linux, /dev/urandom will produce lower quality output if the entropy pool drains, while /dev/random will prefer to block and wait for additional entropy to be collected. With Yarrow, this choice and distinction is not necessary, and the two devices behave identically. You may use either."
Pascal Cuoq
+1  A: 

Just for completeness, here's another way to do it:

shred -s 10 - > my-file

Writes 10 random bytes to stdout and redirects it to a file. shred is usually used for destroying (safely overwriting) data, but it can be used to create new random files too. So if you have already have a file that you want to fill with random data, use this:

shred my-existing-file
jkramer