tags:

views:

68

answers:

2

Hi,

I have to process very large log files (hundreds of Gigabytes) and in order to speed things up I want to split that processing on all the cores I have available. Using seekg and tellg I'm able to estimate the block sizes in relatively small files and position each thread on the beginning of these blocks but when they grow big the indexes overflow.

How can I position and index in very big files when using C++ ifstreams and Linux?

Best regards.

+2  A: 

The easiest way would be to do the processing on a 64-bit OS, and write the code using a 64-bit compiler. This will (at least normally) give you a 64-bit type for file offsets, so the overflow no longer happens, and life is good.

Jerry Coffin
A: 

You have two options:

  1. Use 64-bit OS.
  2. Use OS specific functions.
Kirill V. Lyadvinsky
Actually, there's a third: use a Standard Library implementation that uses the correct OS-specific functions. There's no reason for a standard Library implementation to choose the small-file functions offered by the OS.
MSalters
@MSalters, MSVC++ uses `long` as underlying type for the `streampos`, so it doesn't support files with size over 4GB in Win32. So this is not an option in Windows. I don't know about GNU C++ Standard Library implementation in Linux, but it doesn't seems it supports large files in 32-bit OSes.
Kirill V. Lyadvinsky
I understand that MSVC switched to 64 bits at least since VS2010.
MSalters