views:

167

answers:

5

What is the most efficient way to read a big text file backwards, line by line, using Windows API functions? For example, if a file is:

line 1
...
line 108777
line 108778

the output should be:

line 108778
line 108777
...
line 1

I want to write a C program for this. You don't need to write a code (but if you want, that's great), I am just interested in how to do this having in mind that files are big and that I want program to run as fast as it can.

Also, I am interested in which Windows API functions to use.

+2  A: 

If performance is more important than memory utilization, I'd just do a buffered read of the entire text file into memory and then parse it in whatever order you like.

Take a look at memory mapped files, some advantages of which are discussed here.

Jim Lamb
+1  A: 

Memory-map the file. It will be automatically buffered for you - just read it as if it was memory, starting from the tail and looking for CRs / LFs / CRLFs.

Amadan
+1  A: 

Memory mapped files will fail (or at least become very tricky) if the file's bigger than the available address space. Instead, try this:

input = input file
block_prefix = unique temporary file
block_index = 0

while (!eof (input))
{
   line = input.readline ();
   push line onto a stack

   if (stack > 100 entries) // doesn't have to be 100
   {
      output = block_prefix + block_index++

      while (stack has entries)
      {
        pop line off stack
        write to output
      }
   }
}

if (stack has entries)
{
  output = block_prefix + block_index++

  while (stack has entries)
  {
    pop line off stack
    write to output
  }
}

output = output file

while (block_index)
{
   read entire contents of block file (block_prefix + --block_index)
   write contents to output
   delete block file
}
Skizz
+2  A: 

A more clever solution is to open the file, set the file-offset to the (end of the file - buffersize) and read (buffersize) bytes, u can parse the data in the buffer from back to front to find newlines and do whatever you want, and so on.

Quonux
+1  A: 

One method is to use a container of file offsets to the beginning of each line. After parsing the file, process the container in reverse order. See fgetc, fgets and fseek.

Thomas Matthews
+1. Scan the file once and put each end-of-line position on a stack. Then get into a loop of popping of file position pointers off the stack and seek to that position. Print all the characters between that stack value and the previous one.
selbie
Sorry, but I must use Windows API for this...
Matthew Murdock