views:

309

answers:

3

I am working on a software product with an integrated log file viewer. Problem is, its slow and unstable for really large files because it reads the whole file into memory when you view a log file. I'm wanting to write a new log file viewer that addresses this problem.

What are the best practices for writing viewers for large text files? How does editors like notepad++ and VIM acomplish this? I was thinking of using a buffered Bi-directional text stream reader together with Java's TableModel. Am I thinking along the right lines and are such stream implementations available for Java?

Edit: Will it be worthwhile to run through the file once to index the positions of the start of each line of text so that one knows where to seek to? I will probably need the amount of lines, so will probably have to scan through the file at least once?

Edit2: I've added my implementation to an answer below. Please comment on it or edit it to help me/us arrive at a more best-practice implementation or otherwise provide your own.

+2  A: 

A typical approach is to use a seekable file reader, make one pass through the log recording an index of line offsets and then present only a window onto a portion of the file as requested.

This reduces both the data you need in quick recall and doesn't load up a widget where 99% of its contents aren't currently visible.

msw
+4  A: 

I'm not sure that NotePad++ actually implements random access, but I think that's the way to go, especially with a log file viewer, which implies that it will be read only.

Since your log viewer will be read only, you can use a read only random access memory mapped file "stream". In Java, this is the FileChannel.

Then just jump around in the file as needed and render to the screen just a scrolling window of the data.

One of the advantages of the FileChannel is that concurrent threads can have the file open, and reading doesn't affect the current file pointer. So, if you're appending to the log file in another thread, it won't be affected.

Another advantage is that you can call the FileChannel's size method to get the file size at any moment.

The problem with mapping memory directly to a random access file, which some text editors allow (such as HxD and UltraEdit), is that any changes directly affect the file. Therefore, changes are immediate (except for write caching), which is something users typically don't want. Instead, users typically don't want their changes made until they click Save. However, since this is just a viewer, you don't have the same concerns.

Marcus Adams
Thanks, I also saw RandomAccessFile in addition to FileChannel which may prove useful
Hannes de Jager
A: 

I post my test implementation (after following the advice of Marcus Adams and msw) here for your convenience and also for further comments and criticism. Its quite fast.

I've not bothered with Unicode encoding safety. I guess this will be my next question. Any hints on that very welcome.

class LogFileTableModel implements TableModel {

    private final File f;
    private final int lineCount;
    private final String errMsg;
    private final Long[] index;
    private final ByteBuffer linebuf = ByteBuffer.allocate(1024);
    private FileChannel chan;

    public LogFileTableModel(String filename) {
        f = new File(filename);
        String m;
        int l = 1;
        Long[] idx = new Long[] {};
        try {
            FileInputStream in = new FileInputStream(f);
            chan = in.getChannel();
            m = null;
            idx = buildLineIndex();
            l = idx.length;
        } catch (IOException e) {
            m = e.getMessage();
        }
        errMsg = m;
        lineCount = l;
        index = idx;
    }

    private Long[] buildLineIndex() throws IOException {
        List<Long> idx = new LinkedList<Long>();
        idx.add(0L);

        ByteBuffer buf = ByteBuffer.allocate(8 * 1024);
        long offset = 0;
        while (chan.read(buf) != -1) {
            int len = buf.position();
            buf.rewind();            
            int pos = 0;
            byte[] bufA = buf.array();
            while (pos < len) {
                byte c = bufA[pos++];
                if (c == '\n')
                    idx.add(offset + pos);
            }
            offset = chan.position();
        }
        System.out.println("Done Building index");
        return idx.toArray(new Long[] {});
    }

    @Override
    public int getColumnCount() {
        return 2;
    }

    @Override
    public int getRowCount() {
        return lineCount;
    }

    @Override
    public String getColumnName(int columnIndex) {
        switch (columnIndex) {
        case 0:
            return "#";
        case 1:
            return "Name";
        }
        return "";
    }

    @Override
    public Object getValueAt(int rowIndex, int columnIndex) {
        switch (columnIndex) {
            case 0:                
                return String.format("%3d", rowIndex);
            case 1:
                if (errMsg != null)
                    return errMsg;
                try { 
                    Long pos = index[rowIndex];
                    chan.position(pos);
                    chan.read(linebuf);
                    linebuf.rewind();
                    if (rowIndex == lineCount - 1)
                        return new String(linebuf.array());
                    else    
                        return new String(linebuf.array(), 0, (int)(long)(index[rowIndex+1]-pos));
                } catch (Exception e) {
                    return "Error: "+ e.getMessage();
                }
        }            
        return "a";
    }

    @Override
    public Class<?> getColumnClass(int columnIndex) {
        return String.class;
    }

    // ... other methods to make interface complete


}
Hannes de Jager
Hmmm, ok, seems like my implementation is UTF-8 safe because of UTF-8's inherent self-synchronizing-ness. Checking for '\n' which is binary 00100000 is unique in UTF-8. All bytes that is part of a multi-byte sequence will have at least bit 8 set.
Hannes de Jager