views:

6476

answers:

7

How can I quickly create a large file on a Linux (RHEL) system? dd will do the job, but reading from /dev/zeo and writing to the drive can take a long time when you need a file several hundreds of GB in size for testing... if you need to do that repeatedly the time really adds up.

I don't care about the contents of the file, I just want it to be created quickly. How can this be done?

edit: Using a sparse file won't work this, I need the file to be allocated disk space.

+4  A: 

Try mkfile <size> myfile as an alternative of dd

# mkfile 10240m 10Gigfile

with the -n option the size is noted, but disk blocks aren't allocated until data is written to them.

CMS
Where is this mkfile of which you speak, stranger? It's not in the default RHEL install.
paxdiablo
It's a solaris utility. if you search for gpl mkfile you will find some source code examples.
Martin Beckett
Hmm. The OP wanted to know about Linux. I also see that OSX has mkfile, but I couldn't find one for Linux (except for xfs). Funny that this answer got accepted as correct.
Martin v. Löwis
http://www.ibiblio.org/pub/linux/utils/scripts/!INDEX.short.html
mctylr
+2  A: 

One approach: if you can guarantee unrelated applications won't use the files in a conflicting manner, just create a pool of files of varying sizes in a specific directory, then create links to them when needed.

For example, have a pool of files called:

  • /home/bigfiles/512M-A
  • /home/bigfiles/512M-B
  • /home/bigfiles/1024M-A
  • /home/bigfiles/1024M-B

Then, if you have an application that needs a 1G file called /home/oracle/logfile, execute a "ln /home/bigfiles/1024M-A /home/oracle/logfile".

If it's on a separate filesystem, you will have to use a symbolic link.

The A/B/etc files can be used to ensure there's no conflicting use between unrelated applications.

The link operation is about as fast as you can get.

paxdiablo
good solution IF you have the disk space ready.
hayalci
You can have a small pool or a large pool, it's your choice. You were going to need at least one file anyway, since that's what the questioner asked for. If your pool consists of one file, you lose nothing. If you have bucketloads of disk (and you should, given its low price), there's no issue.
paxdiablo
+2  A: 

I don't think you're going to get much faster than dd. The bottleneck is the disk; writing hundreds of GB of data to it is going to take a long time no matter how you do it.

But here's a possibility that might work for your application. If you don't care about the contents of the file, how about creating a "virtual" file whose contents are the dynamic output of a program? Instead of open()ing the file, use popen() to open a pipe to an external program. The external program generates data whenever it's needed. Once the pipe is open, it acts just like a regular file in that the program that opened the pipe can fseek(), rewind(), etc. You'll need to use pclose() instead of close() when you're done with the pipe.

If your application needs the file to be a certain size, it will be up to the external program to keep track of where in the "file" it is and send an eof when the "end" has been reached.

Barry Brown
+4  A: 

Where seek is the size of the file you want in bytes - 1.

dd if=/dev/zero of=filename bs=1 count=1 seek=1048575

Zoredache
I like this approach, but the commenter doesn't want a sparse file for some reason. :(
ephemient
dd if=/dev/zero of=1GBfile bs=1000 count=1000000
Damien
A: 

Ext4 has much better file allocation performance, since whole blocks of up to 100MB can be allocated at once.

martinus
A: 

Re: speed

Actually you SHOULD be able to get faster than dd, just I don't know that linux has a utility to do that. dd means it actually has to write something to the file, zeroes or whatever. If you don't care about what's in the files, you don't actually have to write anything--just allocate the blocks for that file. That's why something equivalent to mkfile would be much faster.

Angelo
+1  A: 

Hi,

After looking into the linux doc for about an hour, I just found THE magic command :

truncate -s M 10 output.file It creates a 10 MB file... instantaneously ! (M stands for 1024*1024 bytes, MB stands for 1000*1000)

Victor