views:

503

answers:

3

Hey Everyone,

I'm having a hardtime speeding up the processing of a very large textfile (~100 Meg or so). I've made caution to be very diligent using the redim preserve calls, and yet the function still takes 5 minutes or so to run. The textfile is basically sub reports which i'm trying to parse out. I only have access to the large file. What is a person to do. Is VBA just that slow? Here is the code, the "Report" object is a class I created. Most of the reports are just a couple hundred lines, so thats why I choose 1000 for the ubound:

Public Function GetPages(originalFilePath As String) As Collection

Dim myReport                As report
Dim reportPageCollection    As Collection
Dim startLine               As Long
Dim endLine                 As Long
Dim fso                     As FileSystemObject
Dim file                    As textStream
Dim lineStr                 As String
Dim index                   As Long
Dim lines()                 As String

Set fso = New FileSystemObject
Set reportPageCollection = New Collection 'initialize the collection

Set file = fso.OpenTextFile(originalFilePath, ForReading)

ReDim lines(0 To 1000)
lineStr = file.ReadLine 'skip the first line so the loop doesnt add a blank report
lines(0) = lineStr
index = 1

Do Until file.AtEndOfLine 'loop through from the startline to find the end line

    lineStr = file.ReadLine

            If lineStr Like "1JOBNAME:*" Then 'next report, so we want to return an array of the single line

                    'load this page into our report page collection for further processing
                    Set myReport = New report
                    myReport.setDataLines = lines() 'Fill in 'ReportPage' Array

                    reportPageCollection.Add myReport 'add our report to the collection

                    'set up array for new report
                    ReDim lines(0 To 1000)
                    index = 0
                    lines(index) = lineStr
                    index = index + 1
            Else

                    '============================ store into array
                        If index = UBound(lines) Then
                            ReDim Preserve lines(0 To UBound(lines) + 1000)
                            lines(index) = lineStr
                            index = index + 1
                        Else
                            lines(index) = lineStr
                            index = index + 1
                        End If
                    '============================
            End If
Loop

file.Close
Set fso = Nothing
Set GetPages = reportPageCollection

End Function

Any Help is appreciated. Thanks!

A: 

Is VBA just that slow?

Yes. Try XLW, a C++ wrapper for excel.

Mark P Neyer
+3  A: 

I just grabbed a 73-meg, 1.2m line text file from my C:\ drive. It took 6 seconds to read through the whole thing, line by line in Excel VBA (doing nothing but reading). So the speed problem isn't obviously file-IO related.

A few observations:

  • I'm nervous about having a variable named "file" when File is a class within the Scripting Runtime;
  • Do Until file.AtEndOfLine stops almost immediately: you're at the end of a line as soon as you've read one. I think you want Do Until file.AtEndOfStream
  • the rest of your code looks OK, although I'd move all the stuff about adding lines to arrays into a method on your report class
  • is the file physically local? Or are you reading from a network drive? That might account for the problem. If so, consider reading the whole thing into a string and splitting it. 100MB isn't really that big. 9 seconds to do that with my 73MB file.
  • You don't need to create a collection variable: GetPages already wants to be that collection

So your code might shrink to something like this:

Public Function GetPages(originalFilePath As String) As Collection

Dim myReport As report

Set GetPages = New Collection 'initialize the collection'

With New FileSystemObject ' no need to store an object'

    With .OpenTextFile(originalFilePath, ForReading)  ' ditto'

        Set myReport = New report
        myReport.AddLine .ReadLine

        Do Until .AtEndOfStream

            lineStr = file.ReadLine

            If lineStr Like "1JOBNAME:*" Then 
                GetPages.Add myReport
                Set myReport = New report
            End If

            myReport.AddLine lineStr ' all the array business happens here - much tidier'

        Loop
    End With ' TextStream goes out of scope & closes'
End With ' FileSystemObject goes out of scope, disappears'

End Function

Is there anything there that helps?

Mike Woodhouse
the file is out on a network drive, but i failed to mention that I copy it to the local drive first before I start to read it. I'll give this a try. Thanks!
Fink
worked great! Took <10 Sec for a 150 Megabyte file. Not too sure why it was so slow before. Now to tweak the other functions.
Fink
A: 

There are a few tweaks you could make, the FSO object is known to be slower than VB's native IO. But I don't see anything really heinous in here. Before we go micro-optimizing let me ask a more basic question... Would these files happen to be on a shared drive or an ftp site? If so consider copying them down to a temp folder before processing them.

Oorang
When you say "known to be slower", have you any reference for that? I have found the opposite to be true.
Mike Woodhouse
its a shared network drive, but I copy it to the local machine before processing.
Fink
@Mike I don't have a link handy for you, but it's common knowledge (just google "fso is slow"). That's not to say never use the FSO. The FSO saves a lot of time when coding and will be perfectly acceptable for a wide variety of tasks. It's just if you are doing anything very-very heavy duty you will start to notice a difference. A classic example would be iterating all the files in all the directories on the hard drive. Try FSO, then try doing it with the native functions (or Win32 for that matter). You will notice a substantial difference.
Oorang