views:

228

answers:

5

For my implementation of tail shell command in Linux, I need to read certain amount of lines/bytes from the end of the file using stream input/output. Does anyone have suggestions how to do that? I suspect I need to open a file and pass some parameter to the ifstream constructor, but I don't know what exactly. Googling didn't find anything.

+2  A: 
#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main()
{
  ifstream is("file.txt", ios::binary);
  if (!is) {
    cout << "Failed to open file" << endl;
    return 1;
  }

  is.seekg(0, ios::end);
  int len = is.tellg();
  char c;
  int n = 0;
  ostringstream line;
  int lines = 0;

  for (int i = len - 1; i >= 0; --i) {
    is.seekg(i, ios::beg);
    is.get(c);
    if (c == '\n' || i == 0) {
      if (i < len - 1) {
        if (i == 0) {
          line << c;
        }
        string s = line.str();
        cout << lines << ": " << string(s.rend() - n, s.rend()) << endl;
        ++lines;
        n = 0;
        line.seekp(0, ios::beg);
      }
    } else {
      line << c;
      ++n;
    }
  }

  is.close();

  return 0;
}
jspcal
seeking backwards past the begininng of file (i.e. if the file is less than 4096 bytes in size) is actually undefined
anon
this doesn't seek past the starting point
jspcal
+2  A: 

I don't think there's an easy way to go about this, you'll probably need to seek to the end of the file, back up one 'chunk' (an arbitrary size, but a couple of kilobytes perhaps), read that 'chunk' of data and start looking through it for new line characters, if you didn't find enough, you back up twice your chunk size (remember, you read forward, so you need to back up the one you read, plus the one you want to read next), and read in another one.

HTH

roe
+4  A: 

Since tail needs to work with pipes, that you can't rewind, you'll have to keep a rotating buffer of the last n lines you've read which you will dump on EOF.

Tobu
This method is fine for short files. But large files require a diffeent technique. You need to seek to the end then start backing up.
Martin York
You can use different techniques for different files, but the one I described is still required for things like stdin.
Tobu
Moreover, homework doesn't need to perform well, at least, he didn't say it does.
Potatoswatter
+3  A: 

This problem is analogous to the problem of getting the last n nodes of a singly-linked list. You have to go all the way to the end with a buffer of n lines, then spit out the lines from buffer.

wilhelmtell
A: 

this shows how you'd do it in c++... read successive chunks from the end of the file, then scan the chunks for new lines. if a newline isn't found, part of the chunk has to be kept around and combined with the next chunk read in...

//
// USAGE: lastln COUNT [FILE]
//
// Print at most COUNT lines from the end of FILE or standard input.
// If COUNT is -1, all lines are printed.
//

#include <errno.h>
#include <libgen.h>
#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main(int argc, char **argv)
{
  int ret = 0, maxLines = -1, len, count = 0, sz = 4096, lines = 0, rd;
  istream *is;
  ifstream ifs;
  stringstream ss;
  char *buf = NULL;
  const char *prog = (argc > 0 && argv[0] ? basename(argv[0]) : "");
  string line;

  if (argc > 1) {
    if ((maxLines = atoi(argv[1])) == 0) {
      goto end;
    }
  }

  if (argc > 2 && !(argv[2] && argv[2][0] == '-' && argv[2][1] == '\0')) {
    ifs.open(argv[2], ios::in | ios::binary);
    if (!ifs) {
      ret = 1;
      cerr << prog << ": " << argv[2] << ": " << strerror(errno) << endl;
      goto end;
    }
    is = &ifs;
  } else {
    ss << cin.rdbuf();
    if (!ss) {
      ret = 1;
      cerr << prog << ": failed to read input" << endl;
      goto end;
    }
    is = &ss;
  }

  is->seekg(0, ios::end);
  len = is->tellg();
  buf = new char[sz + 1];

  while (rd = min(len - count, sz)) {
    is->seekg(0 - count - rd, ios::end);
    is->read(buf, rd);
    count += rd;
    char *p = buf + rd, *q;
    *p = '\0';

    for (;;) {
      q = (char *)memrchr(buf, '\n', p - buf);
      if (q || count == len) {
        if (q) *q = '\0';
        if (lines || p - q - 1 > 0 || !q) {
          ++lines;
          cout << lines << ": " << (q ? q + 1 : buf) << line << endl;
          line.clear();
          if (lines >= maxLines && maxLines != -1) break;
        }
        if (q) p = q; else break;
      } else {
        line = string(buf, p - buf) + line;
        break;
      }
    }
  }

  end:

  if (buf) delete[] buf;
  return ret;
}
jspcal
This program uses pointers and manual memory management for no reason. The program could just as well use std::vector for dynamically allocated data and auto storage std::ifstream instead of a pointer and dynamic allocation. This avoids all cleanup at the end, thus also avoiding the use of goto. It also avoids a memory leak.
Tronic
no, you *need a pointer* with `memrchr`. it *only takes a pointer*. a vector would be a pointless layer of indirection. a pointer *is necessary* to use cin here. you seem to forget that *references can't be reseated* in c++. goto is *necessary* so the OP can print messages at the end of the run. and there's no need to free data right before exit(). any case, you obsess over syntactic sugar.
jspcal
Wyzard
@Wyzard: your solution requires adding an *unnecessary function* and *multiple calls to it* to deal with with a problem that pointers are designed to solve using c++'s *built-in polymorphism*. your solution *doesn't scale*. what if there are more than 2 stream types? you create redundant branches. you are going to *extraordinary lengths* to avoid pointers, which is obsessing over syntactic sugar. and you admit that even with vector, the underlying pointer is exposed *in any case*, so you're *still using pointers*.
jspcal