tags:

views:

105

answers:

1

I've got a xml that I'm parsing and trying to extract some data from. Let's say the resulting dataset, after parsing an input xml file, has (2) Tables.

Table #1 contains an IP Address and a primary key. Table #2 contains port numbers and a matching primary key.

I want to go through both tables and generate an object that contains an IP Address and matching port. Basically merging data from two tables that share the same primary key.

Right now, I'm using a foreach loop nested within another foreach loop. The outer one goes through each IP Address and the inner one goes through each port and matches the same primary key.

The result works, but it's O(n^2). Is there a faster way to do this?

btw, I'm using C#

+1  A: 

First, ensure that you set the PrimaryKey property of each DataTable to the appropriate column. Then, instead of the inner loop, use table.Rows.Find(primaryKeyValue) to pull out the appropriate row from the second DataTable. On both the NT and Compact frameworks, this will create and use a red-black tree index internally, giving you O(n log n) time.

To get to O(n), you need to create a Dictionary (implemented internally as a hash table) of the rows in the second table and perform your lookup on this. Ensure that you create the Dictionary with sufficient capacity that it will not need to be resized during insertion.

Jeffrey Hantin
Wouldn't the process of first creating this "Dictionary" be at least O(n) time? Then to use the dictionary is additional time. How is that just O(n) for the entire process?
Nick
You need to loop to build an index in either case -- in the case of `DataTable`, that loop is in the `System.Data` assembly instead of your code, so you don't see it. Also, big-O notation ignores coefficients, O(n) == O(2*n) -- and roughly speaking, two loops just means multiplying in a coefficient of 2. http://en.wikipedia.org/wiki/Big_O_notation
Jeffrey Hantin