views:

459

answers:

4

Why is the size of files capped at 4 GB when outputting to a file using print? I would expect that with streaming output it should be possible to generate files of arbitrary size.

Update: ijw and Chas. Owens were correct. I thought the F: drive was NTFS formatted, but in fact it used the FAT32 filesystem. I tried it on another drive and I could generate a 20 GB text file. There are no limits in this case. Apologies to all.


Details: while researching for answering a question here on Stack Overflow I needed to measure the performance of reading a very large text file using Perl. In order to test the reading I needed a large text file and I wrote a small Perl script to generate the text file and ran into an unexpected problem. The output file grows until it reach 4 GB. According to Windows Explorer the size in one run of the script was 4294967269 bytes (and 4294967296 bytes on disk). The script continues, but the file no longer grows.

Essential it is just a number of:

print NUMBERS_OUTFILE $line;

where $line is a long string with a "\n" at the end. The length of the line can be configured and is not critical for this problem; e.g. 250 characters or 34000 characters. NUMBERS_OUTFILE is a file handle created with:

open ( NUMBERS_OUTFILE,">F:\temp2\out1.txt")

Drive F: is NTFS formatted and is on a separate physical hard disk from the disk with the operating system.

What is the reason and is there a work-around?


Full Perl script and BAT driver script (HTML formatted with the pre tag). If the two environment variables MBSIZE and OUTFILE are setup then the Perl script should be able to run unchanged on other platforms than Windows.

Platform: Perl 5.10.0 from ActiveState; 32 bit; build 1004. Windows XP x64 SP2, 8 GB RAM, 500 GB Green Caviar hard disks.

perl -V says:

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:
    osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -DNO_ST
RICT -DHAVE_DES_FCRYPT -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST_IN_INC -DPERL_IMPLICIT_CONTE
XT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf  -libpath:"D:\Perl\
lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.lib a
dvapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib mpr.l
ib winmm.lib  version.lib odbc32.lib odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib  comdlg32.l
ib advapi32.lib shell32.lib ole32.lib oleaut32.lib  netapi32.lib uuid.lib ws2_32.lib m
pr.lib winmm.lib  version.lib odbc32.lib odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl510.lib
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf  -libpat
h:"D:\Perl\lib\CORE"  -machine:x86'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV
                        PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS
                        PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_ITHREADS
                        USE_LARGE_FILES USE_PERLIO USE_SITECUSTOMIZE
  Locally applied patches:
        ActivePerl Build 1004 [287188]
        33741 avoids segfaults invoking S_raise_signal() (on Linux)
        33763 Win32 process ids can have more than 16 bits
        32809 Load 'loadable object' with non-default file extension
        32728 64-bit fix for Time::Local
  Built under MSWin32
  Compiled at Sep  3 2008 13:16:37
  @INC:
    D:/Perl/site/lib
    D:/Perl/lib
.
+2  A: 

I guess the "32 bit" part is the problem... The largest number you can represent in a 32-bit number is around 4 GB (http://en.wikipedia.org/wiki/Integer_%28computer_science%29)

--Edit--

I wasn't actually referring to the filesystem limit, but to the Perl limit. As it's compiled on 32-bit and can only access 4 GB of raRAM. NTFS as far as I know does have a limit around 8 GB, and uses some kind of windowing method to read those files. But that's another story.

Quamis
Bound to be that. The limitation is probably not in Perl itself, but whether it's the way ActiveState have ported it to windows, or whether your filesystem has a 4GB limit, I don't know.
ijw
Yes, it is clear where the 4 GB limit comes from. But why should it affect streaming output? It is not a problem to create files larger than 4 GB on Windows and NTFS (although the system calls are a little bit more complicated). Maybe because Perl depends on the C standard library?
Peter Mortensen
+5  A: 

I think that the problem is that you cannot write to file positions later than 4 GB due to the limit of 4 bytes for the file position pointer. This is even though you are using streaming output as Perl still has to keep track of the file position.

I would try to use Win32API::File instead - it allows seeking to positions larger than 4 GB by sending the high order 4 bytes of the file position pointer in a different field, and should work well using writeFile() to write to the output file.

Guss
I think this may be the reason. See discussion here:http://www.perlmonks.org/?node_id=129236
ire_and_curses
Yes, it should be possible with Win32API::File. But isn't there a platform independent and simpler way?
Peter Mortensen
@ire_and_curses: but this is for random access to a file. Why would streaming output be affected?
Peter Mortensen
So are Perl file handles dependent on the bitness? Would the 64 bit version of Perl be able to handle files larges than 4 GB? Or are Perl file handles inherently limited to 32 bit/ 4 GB ?
Peter Mortensen
+5  A: 

Here's one thing I found (link):

Configure-time Options

The INSTALL document describes several Configure-time options. Some of these will work with Cygwin, others are not yet possible. Also, some of these are experimental. You can either select an option when Configure prompts you or you can define (undefine) symbols on the command line.

...

  • -Duselargefiles

    Although Win32 supports large files, Cygwin currently uses 32-bit integers for internal size and position calculations.

DVK
Added the link
DVK
+7  A: 

Hmm, that is odd. At least on OS X and Linux, the limit is imposed by the filesystem. Perhaps Activestate Perl on Win32 is not compiled with largefile support? Could you post the result of running perl -V?

The portion of the output we care about is

Platform:
osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread
uname=''
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef

Specifically, uselargefiles=define. The fact that this feature is defined (i.e. turned on) means that Perl will use an unsigned 64 bit integer for file offsets. This, theoretically, enables files up to 16 exabytes (17,179,869,184 gigabytes); however, filesystem limits often come into play before you reach that limit.

Chas. Owens
@Peter Mortensen: Actually he asked for the output of perl -V (where the V is uppercase)
R. Bemrose
That needs to be an uppercase v, lowercase v just prints out the version information.
Chas. Owens
You were correct. It was my mistake; the filesystem was actually FAT32 formatted. Running on another drive enabled the creation of a 20 GB text file.
Peter Mortensen
Is this an answer to the question, really?
unwind
Yes, Unwind, this is the answer to the question. Chas's answer indicated that it's usually the file system, not Perl, that limits the size of the file. Now, this doesn't mean that there can't be other reasons that you'd be unable to create larger files with Perl, but in Peter's case, the restrictions of the file system were the culprit.
Rob Kennedy