views:

2446

answers:

5

I am running into integer overflow using the standard ftell and fseek options inside of G++, but I guess I was mistaken because it seems that ftell64 and fseek64 are not available. I have been searching and many websites seem to reference using lseek with the off64_t datatype, but I have not found any examples referencing something equal to fseek. Right now the files that I am reading in are 16GB+ CSV files with the expectation of at least double that.

Without any external libraries what is the most straightforward method for achieving a similar structure as with the fseek/ftell pair? My application right now works using the standard GCC/G++ libraries for 4.x.

+3  A: 

fseek64() isn't standard, the compiler docs should tell you where to find it.

Have you tried fgetpos and fsetpos? They're designed for large files and the implementation typically uses a 64-bit type as the base for fpos_t.

Luca Matteis
+5  A: 

fseek64 is a C function. To make it available you'll have to define _FILE_OFFSET_BITS=64 before including the system headers That will more or less define fseek to be actually fseek64. Or do it in the compiler arguments e.g. gcc -D_FILE_OFFSET_BITS=64 ....

http://www.suse.de/~aj/linux_lfs.html has a great overviw of large file support on linux:

  • Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use the 64 bit variants. Several types change also, e.g. off_t becomes off64_t. It's therefore important to always use the correct types and to not use e.g. int instead of off_t. For portability with other platforms you should use getconf LFS_CFLAGS which will return -D_FILE_OFFSET_BITS=64 on Linux platforms but might return something else on e.g. Solaris. For linking, you should use the link flags that are reported via getconf LFS_LDFLAGS. On Linux systems, you do not need special link flags.
  • Define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. With these defines you can use the LFS functions like open64 directly.
  • Use the O_LARGEFILE flag with open to operate on large files.
nos
So, I followed your instructions and everything is compiling file. But I seem to still be getting an overflow. How would you use the O_LARGEFILE parameter with fopen64?
John Bellone
If you compile with -D_FILE_OFFSET_BITS=64, O_LARGEFILE is supplied automatically. This is not a standard flag; it is used on Linux to keep track of whether the file was opened with large file interfaces.
mark4o
You've asked the question as C++, are you using/mixing C file operations with C++ streams , or are you using only the C API ? Also, do you have some test code to reproduce the behavior ? It's paramount that you use the correct types dealing with lengths/offsets.
nos
I am only using the C API, and have been using off64_t as my types.
John Bellone
The key to this answer is -D_FILE_OFFSET_BITS=64 and is actually what fixed my problem. When using multiple shared libraries my suggestion would be to apply this to all of the build Makefiles.
John Bellone
+1  A: 

Use fsetpos(3) and fgetpos(3). They use the fpos_t datatype , which I believe is guaranteed to be able to hold at least 64 bits.

Adam Rosenfield
+1  A: 

Have you tried fseeko() with the _FILE_OFFSET_BITS preprocessor symbol set to 64?

This will give you an fseek()-like interface but with an offset parameter of type off_t instead of long. Setting _FILE_OFFSET_BITS=64 will make off_t a 64-bit type.

The same for goes for ftello().

Void
+2  A: 

If you want to stick to ISO C standard interfaces, use fgetpos() and fsetpos(). However, these functions are only useful for saving a file position and going back to the same position later. They represent the position using the type fpos_t, which is not required to be an integer data type. For example, on a record-based system it could be a struct containing a record number and offset within the record. This may be too limiting.

POSIX defines the functions ftello() and fseeko(), which represent the position using the off_t type. This is required to be an integer type, and the value is a byte offset from the beginning of the file. You can perform arithmetic on it, and can use fseeko() to perform relative seeks. This will work on Linux and other POSIX systems.

In addition, compile with -D_FILE_OFFSET_BITS=64 (Linux/Solaris). This will define off_t to be a 64-bit type (i.e. off64_t) instead of long, and will redefine the functions that use file offsets to be the versions that take 64-bit offsets. This is the default when you are compiling for 64-bit, so is not needed in that case.

mark4o