tags:

views:

1388

answers:

3

I believe I have come up with a very efficient way to read very, very large files line-by-line. Please tell me if you know of a better/faster way or see room for improvement. I am trying to get better at coding, so any sort of advice you have would be nice. Hopefully this is something that other people might find useful, too, so it isn't just all me asking for help without giving anything back.

It appears to be something like 8 times faster than using Line Input from my tests.

'This function reads a file into a string.                        '
'I found this in the book Programming Excel with VBA and .NET.    '
Public Function QuickRead(FName As String) As String
    Dim I As Integer
    Dim res As String
    Dim l As Long

    I = FreeFile
    l = FileLen(FName)
    res = Space(l)
    Open FName For Binary Access Read As #I
    Get #I, , res
    Close I
    QuickRead = res
End Function

'This function works like the Line Input statement'
Public Sub QRLineInput( _
    ByRef strFileData As String, _
    ByRef lngFilePosition As Long, _
    ByRef strOutputString, _
    ByRef blnEOF As Boolean _
    )
    On Error GoTo LastLine
    strOutputString = Mid$(strFileData, lngFilePosition, _
        InStr(lngFilePosition, strFileData, vbNewLine) - lngFilePosition)
    lngFilePosition = InStr(lngFilePosition, strFileData, vbNewLine) + 2
    Exit Sub
LastLine:
    blnEOF = True
End Sub

Sub Test()
    Dim strFilePathName As String: strFilePathName = "C:\Fld\File.txt"
    Dim strFile As String
    Dim lngPos As Long
    Dim blnEOF As Boolean
    Dim strFileLine As String

    strFile = QuickRead(strFilePathName) & vbNewLine
    lngPos = 1

    Do Until blnEOF
        Call QRLineInput(strFile, lngPos, strFileLine, blnEOF)
    Loop
End Sub

Thanks for the advice!

+1  A: 

With that code you load the file in memory (as a big string) and then you read that string line by line.

By using Mid$() and InStr() you actually read the "file" twice but since it's in memory, there is no problem.
I don't know if VB's String has a length limit (probably not) but if the text files are hundreds of megabyte in size it's likely to see a performance drop, due to virtual memory usage.

Nick D
This is a very good point. I was rather naively exaggerating by using two very's. The size of the files that I am using are about five to ten megabytes, and never more than fifty.
Justin
+1  A: 

I would think , in a large file scenario using a stream would be far more efficient, because memory consumption would be very small.

But your algorithm could alternate between using a stream and loading the entire thing in memory based on the file size. I wouldn't be surprised if one is only better than the other under certain criteria.

Jeremy
This is also an excellent point, and I have found that it is especially true if you only need to read information from the start of the file; in this case using a stream would be much, much better. Also, it's good that you bring up the issue of memory because I am not especially conscious of the effects of memory usage when I program, but I guess that's just a consequence of my being a novice.
Justin
A: 

You can use Scripting.FileSystemObject to do that thing. From the Reference:

The ReadLine method allows a script to read individual lines in a text file. To use this method, open the text file, and then set up a Do Loop that continues until the AtEndOfStream property is True. (This simply means that you have reached the end of the file.) Within the Do Loop, call the ReadLine method, store the contents of the first line in a variable, and then perform some action. When the script loops around, it will automatically drop down a line and read the second line of the file into the variable. This will continue until each line has been read (or until the script specifically exits the loop).

And a quick example:

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\FSO\ServerList.txt", 1)
Do Until objFile.AtEndOfStream
 strLine = objFile.ReadLine
 MsgBox strLine
Loop
objFile.Close
Rodrigo
This is another interesting point. My (relatively limited) testing has shown that this was actually the slowest method of the three. Opening the files as a stream using the FSO took much more time than opening with an integer file handle, and it took about the same amount of time as reading the whole file into a string. When it came to actually reading line-by-line it was also slower... if I recall correctly, anyway; it's been a while since I did my tests and posted all of this.
Justin
Did you test only the file reading or the file reading and the concatenation? I've written apps that uses filesystemobject to load huge files (over 400MB) and never took too long (no more than a few seconds to load the whole file). Remember, string concatenation is always slow, unless you implement concatenation using arrays.
Rodrigo