views:

26

answers:

1

Hello, I'd like to make a simple text file viewer and I'd like it to be able to handle large files (possibly larger than the computer's memory).

I know that I need to implement something like a sliding buffer, that will contain the currently visible portion of the file. The main problem is to determine the relation between lines and file offsets. If I just needed to be able to navigate by lines, I'd just need an linked list of lines and on line up/line down just read new line from the file. But what should I do when I also want to go to, say 50% of the file? I need to show the lines starting from the half of the file, so if the file is 10000 bytes long, I'd seek to byte 5000, look for a line break and display stuff from there. The problem is, that I don't know what line I'm at when seeking like this.

So what I would like to know is what would be a suitable data structure for keeping these few lines in memory (the ones that will be painted on the screen).

Keep in mind that I don't need to edit the files, just view them, so I don't need to care about efficiency of the chosen approach for editing.

A: 

If you're reading in a defined chunk of bytes via a FileStream you could keep track of which byte you read last so you know where to pick up next to read more data chunks from the file. FileStream exposes Read() which allows you to specify an offset byte (position to start) and also how many bytes to read at a time.

After you read in your bytes you can decode them to UTF8 with a decoder, for instance, and then retrieve a char array with it. All of that should initialize your initial data. What I would do since this will be displayed somewhere is setup event handlers tied to scrolling. When you start scrolling down you can remove top lines from memory (at the same time counting their bytes before deleting so you can dynamically read in the next set bytes with the same exact size) and append new lines to the bottom. Likewise for scrolling upward.

If you're wanting to figure out half of your data then you could try something with makign a FileInfo object on the text file path and then using the Length() method to return the number of bytes. Since streams deal in bytes this comes in handy when trying to read in a percentage. You can use that to define how many bytes to read in. You'll have to read data in to determine where line breaks are and set your last byte read as the CR-LF to pickup at the next line when you retrieve data again.

Here's what I would do to read a predefined count of bytes from a file.

public static LastByteRead = 0; // keep it zero indexed

public String[] GetFileChunk( String path, long chunkByteSize )
{
    FileStream fStream;
    String[] FileTextLines;
    int SuccessBytes = 0;
    long StreamSize;
    byte[] FileBytes;
    char[] FileTextChars;
    Decoder UtfDecoder = Encoding.UTF8.GetDecoder();
    FileInfo TextFileInfo = new FileInfo(path);

    if( File.Exists(path) )
    {
        try {
            StreamSize = (TextFileInfo.Length >= chunkByteSize) ? chunkByteSize : TextFileInfo.Length;
            fStream = new FileStream( path, FileMode.Open, FileAccess.Read );
            FileBytes = new byte[ StreamSize ];
            FileTextChars = new char[ StreamSize ]; // this can be same size since it's UTF-8 (8bit chars)

            SuccessBytes = fStream.Read( FileBytes, 0, (Int32)StreamSize );

            if( SuccessBytes > 0 )
            {
                UtfDecoder.GetChars( FileBytes, 0, StreamSize, FileTextChars, 0 );
                LastByteRead = SuccessBytes - 1;

                return 
                    String.Concat( fileTextChars.ToArray<char>() ).Split('\n');
            }

            else
                return new String[1] {""};
        }

        catch {
            var errorException = "ERROR: " + ex.Message;
            Console.Writeline( errorException );
        }

        finally {
            fStream.Close();
        }   
    }   
}

Maybe that will get you in the right direction at least.

jlafay