I am working on a project where I search through a large text file (large is relative, file size is about 1 Gig) for a piece of data. I am looking for a token and I want a dollar value immediately after that token. For example,
this is the token 9,999,999.99
So here's is how I am approaching this problem. After a little analysis it appears that the token is usually near the end of the file so I thought I would start searching from the end of the file. Here is the code I have so far (vb.net):
Dim sToken As String = "This is a token"
Dim sr As New StreamReader(sFileName_IN)
Dim FileSize As Long = GetFileSize(sFileName_IN)
Dim BlockSize As Integer = CInt(FileSize / 1000)
Dim buffer(BlockSize) As Char
Dim Position As Long = -BlockSize
Dim sBuffer As String
Dim CurrentBlock As Integer = 0
Dim Value As Double
Dim i As Integer
Dim found As Boolean = False
While Not found And CurrentBlock < 1000
CurrentBlock += 1
Position = -CurrentBlock * BlockSize
sr.BaseStream.Seek(Position, SeekOrigin.End)
i = sr.ReadBlock(buffer, 0, BlockSize)
sBuffer = New String(buffer)
found = SearchBuffer(sBuffer, sToken, Value)
End While
GetFileSize is a function that returns the filesize. SearchBuffer is a function that will search a string for the token. I am not familiar with regular expressions but will explore it for that function.
Basically I read in a small chunk of the file search it and if I don't find it load another chunk and so on...
Am I on the right track or is there a better way?