views:

807

answers:

3

I have two datatables. DataTable dtRequired and DataTable dtResult.

I want to output a datatable that contains rows that were not present in dtResponse but were found in dtRequired.

Approach 1 We have been using the algorithm specified at the following url http://weblogs.sqlteam.com/davidm/archive/2004/01/19/739.aspx. And this algorithm figured to be one of the slower ones in our profiling.

Approach 2 So, I tried to replace the above algo with something thats described below. dtRequired is indexed on the columns I m using below to Find the row.

    if (dtResult.Rows.Count > 0)
    {
        lock (dtResult)
        {
            DataRow rowfound = null;
            for (int i = 0; i < dtResult.Rows.Count; i++)
            {
                DataRow row = dtResult.Rows[i];
                rowfound = dtRequired.Rows.Find(new object[] { row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[8] });
                if (rowfound != null)
                {
                    dtRequired.Rows.Remove(rowfound);
                }
            }                        
        }
    }

The above piece however is taking longer than the time taken by Approach 1. Approach 2 takes ~3 secs for dtResult with 1250 rows and dtRequired with 4500 rows.

Is something wrong with the approach I mentioned above? Is there any better approach of achieving this?

+1  A: 

This is a link to a MSDN page that use LINQ to DataSet to obtain the rows that appears in both datatables. This example use Intersect. I think you could modify it using except instead. I don't know if the performance will be better or not.

Jonathan
I am on .NET 2.0.
A: 

private IEnumerator GetEnumerator( DataTable dtRequired, DataTable dtResponse ) { foreach( DataRow row in dtResponse.Rows ) { // use the columns of the primary key below if( dtResult.Rows.Contains( new object[] { row[0], row[2], row[4] } ) ) continue; else yield return row.ItemArray;

    }
}

private void GetComplement( DataTable dtRequired, DataTable dtResponse, out DataTable dtResult )
{
    DataTable dtResult = dtRequired.Clone();

    foreach( object[] items in GetEnumerator( dtRequired, dtResponse ) )
    {
        dtResult.Rows.Add( items );
    }

    return;
}
maxwellb
There may some syntactic closing to be done with IEnumerator<T> vs. IEnumerable<T>.
maxwellb
A: 
  1. You said that your looped Find() method is less efficient than Approach 1 http://weblogs.sqlteam.com/davidm/archive/2004/01/19/739.aspx.

  2. I've seen people talk about ADO.NET 3.5 and LINQ, presuming you have a production LINQ to or use an iterative method to populate some generic container.

  3. I wonder if creative usage of a HashTable would just happen to be faster, computationally (real world, not theory). In the case of Diff(tbl1,tb2) simply populate the hash with tbl2 then iteratively add the tbl1 members. For each success also add a copy of the member to the output (difference) array to be displayed/returned. For each failure, obviously, it already exists, so don't output/return that value.

Let me know, I'll rework my code if you confirm 3 is fastest. I'm comparing a DirectoryServices.FindAll() Collection to a SqlDataReader() and LINQ to Active Directory is in 3rd Party Beta, I guess. So I need a 'production' approved method here that is as efficient as possible @ 15,000 objects.