views:

455

answers:

5

I'm trying to read in a 24 GB XML file in C, but it won't work. I'm printing out the current position using ftell() as I read it in, but once it gets to a big enough number, it goes back to a small number and starts over, never even getting 20% through the file. I assume this is a problem with the range of the variable that's used to store the position (long), which can go up to about 4,000,000,000 according to http://msdn.microsoft.com/en-us/library/s3f49ktz%28VS.80%29.aspx, while my file is 25,000,000,000 bytes in size. A long long should work, but how would I change what my compiler(Cygwin/mingw32) uses or get it to have fopen64?

+2  A: 

The ftell() function typically returns an unsigned long, which only goes up to 232 bytes (4 GB) on 32-bit systems. So you can't get the file offset for a 24 GB file to fit into a 32-bit long.

You may have the ftell64() function available, or the standard fgetpos() function may return a larger offset to you.

Loadmaster
I dont have ftell64(), and fgetpos() returns the same thing as ftell()
zacaj
+2  A: 

You might try using the OS provided file functions CreateFile and ReadFile. According to the File Pointers topic, the position is stored as a 64bit value.

Dolphin
oh god, I remember using these when I was learning assembly.
Malfist
Don't scare people, those are C functions and part of the windows API :)
Dolphin
CreateFile isn't that bad guys...
Ed Swangren
A: 

Unless you can use a 64-bit method as suggested by Loadmaster, I think you will have to break the file up.

This resource seems to suggest it is possible using _telli64(). I can't test this though, as I don't use mingw.

Adrian Mouat
But theres no compiler option or anything to enable them? I can see them in the include files, but theyre under an #ifdef.
zacaj
hmm, is _telli64 available?
Adrian Mouat
no... its not available
zacaj
A: 

I don't know of any way to do this in one file, a bit of a hack but if splitting the file up properly isn't a real option, you could write a few functions that temp split the file, one that uses ftell() to move through the file and swaps ftell() to a new file when its reaching the split point, then another that stitches the files back together before exiting. An absolutely botched up approach, but if no better solution comes to light it could be a way to get the job done.

Toby
A: 

Even if the ftell() in the Microsoft C library returns a 32-bit value and thus obviously will return bogus values once you reach 2 GB, just reading the file should still work fine. Or do you need to seek around in the file, too? For that you need _ftelli64() and _fseeki64().

Note that unlike some Unix systems, you don't need any special flag when opening the file to indicate that it is in some "64-bit mode". The underlying Win32 API handles large files just fine.

tml