views:

659

answers:

4

I need to extract data from a .dbf file and transform it into xml. I wrote a routine that does it just fine. However now we are encountering very large .dbf files - like 2GB +. And this code throws an OutOfMemoryException on those files.

Public Function GetData() As DataTable
    Dim dt As New DataTable(Name)
    Dim sqlcommand As String= "Select * From MyTable"
    Dim cn As New OleDbConnection(myconnectionstring)

    Try
        cn.Open()
        Dim cmd As New OleDbCommand(sqlcommand, cn)
        dt.Load(cmd.ExecuteReader())
    Catch ex As Exception
        Throw ex
    Finally
        dt.Dispose()
        cn.Close()
        cn.Dispose()
    End Try
    Return dt

The thing is - if I run this same code on my computer through Visual Studio in debug mode against the same 2GB .dbf file no exception is thrown. It's almost like Visual Studio manages the memory differently then the app does alone.

Is there anyway to get around the memory issues? I've tried using a DataAdapter with similar results. Is this behavior I am seeing with Visual Studio expected/by design?

A: 

If you want to process large files, think about DataReader without filling DataSet; it won't load the whole table in memory, but go row-by-row. And also use SqlDataAdapter.Fill() instead of this way, don't forget to Dispose it.

Dim conn As New SqlConnection(connection)
Dim adapter As New SqlDataAdapter()
adapter.SelectCommand = new SqlCommand(query, conn)
adapter.Fill(dataset)
adapter.Dispose()
conn.Dispose()

You don't really need to call .Close() for connection, as it's called when you call .Dispose().

PS: You don't close Reader, probably that's why. Yes, VS.NET will close it faster than GC.

Vitaly
+1  A: 

In no way you can load an entire 2GB database in memory. You will need to load and process the database records in chunks

To make the partial loading of the database, you can use for example the TOP and ROWNUM clauses in the SELECT command. Look at the documentation of SQL Server for more details.

Konamiman
+3  A: 

A datatable is in memory, so it will fail on large files or go very slow depending on the size of the file.

You'll need to use a SqlDataReader to read the data record by record and an XmlWriter to create your XML file.

Something like this (Code not checked)

Public Sub WriteToXml(Dim xmlFileName As String, Dim connectionString)
 Dim writer As XmlWriter
 writer = XmlWriter.Create(xmlFileName)
 Dim commandText As String= "Select * From MyTable"
 Dim connection As New OleDbConnection(connectionString)

 Try
  connection.Open()
  Dim command As New OleDbCommand(commandText, connection)
  Dim reader As SqlDataReader
  reader = myCommand.ExecuteReader()

  While reader.Read()    
   write.WriteRaw("xml")
  End While
 Catch ex As Exception
  Throw ex
 Finally        
  connection.Close()
  connection.Dispose()
 End Try
End Sub
OG
A: 

Why not do something simple like:

using (var writer = CreateXmlWriter(fileName))
{
  while (reader.Read()) 
  {
    var value = new ObjectFromDatabaseReader(reader);
    value.WriteXml(writer);
  }
}

This will just stream the data in a row at a time, convert to an object and then save as xml. This will hardly use any memory at all and should be very fast.

Nick R