views:

50

answers:

2

Simple concept we are basically doing some auditing, comparing what came in, and what actually happened during processing. I am looking for a better way to execute a query that can do side by side table comparisons with columns that are slightly differnt in name and potentialy type.

DB Layout:

Table (* is the join condition)

Log (Un-altered data record.)
- LogID
- RecordID*
- Name
- Date
- Address
- Products
- etc.

Audit (post processing record)
- CardID*
- CarName
- DeploymentDate
- ShippingAddress
- Options
- etc.

For example this would work if you look past the annoying complexity to write, and performance issues.

The query just joins the left and right and selects them as strings. Showing each field matched up.

select 
  cast(log.RecordID as varchar(40)) + '=' + cast(audit.CardID as varchar(40),
  log.Name+ '=' + audit.Name ,
  cast(log.Date as varchar(40)) + '=' + cast(audit.DeploymentDate as varchar(40), 
  log.Address + '=' + audit.ShippingAddress,
  log.Products+ '=' + audit.Options
  --etc
from Audit audit, Log log
  where audit.CardID=log.RecordId

Which would output something like:

1=1 Test=TestName 11/09/2009=11/10/2009 null=My Address null=Wheels

This works but is extremely annoying to build. Another thing I thought of was to just alias the columns, union the two tables, and order them so they would be in list form. This would allow me to see the column comparisons. This comes with the obvious overhead of the union all.

ie:

Log 1 Test 11/09/2009 null, null
Audit 1 TestName 11/10/2009 My Address Wheels

Any suggestions on a better way to audit this data?

Let me know what other questions you may have.

Additional notes. We are going to want to reduce the unimportant information so in some cases we might null the column if they are equal (but i know its too slow)

  case when log.[Name]<>audit.[CarName] then (log.[Name] + '!=' + audit.[CarName]) else null end

or if we are doing the second way

  nullif(log.[Name], audit.[CarName]) as [Name]
  ,nullif(audit.[CarName], log.[Name]) as [Name]
A: 

Would something like this work for you:

select 
  (Case when log.RecordID = audit.CardID THEN 1 else 0) as RecordIdEqual,
  (Case when log.Name = audit.Name THEN 1 else 0) as NamesEqual ,
  (Case when log.Date = audit.DeploymentDate THEN 1 else 0) as DatesEqual, 
  (Case when log.Address = audit.ShippingAddress THEN 1 else 0) as AddressEqual,
  (Case when log.Products = audit.Options THEN 1 else 0) as ProductsEqual
  --etc
from Audit audit, Log log
  where audit.CardID=log.RecordId

This will give you a break down of what's equal based on the column name. Seems like it might be easier than doing all the casting and having to interpret the resulting string...

Abe Miessler
I do the casting because if the records are equal we might need to know the value of the column. Assume column A is correct(and is the unique identifier) and B is incorrect we would need to see the value of A. But I will probably use this strategy to filter out the values that aren't as important.
Nix
+1  A: 

I've found the routine given here by Jeff Smith to be helpful for doing table comparisons in the past. This might at least give you a good base to start from. The code given on that link is:

CREATE PROCEDURE CompareTables(@table1 varchar(100), 
    @table2 Varchar(100), @T1ColumnList varchar(1000),
    @T2ColumnList varchar(1000) = '')
AS

-- Table1, Table2 are the tables or views to compare.
-- T1ColumnList is the list of columns to compare, from table1.
-- Just list them comma-separated, like in a GROUP BY clause.
-- If T2ColumnList is not specified, it is assumed to be the same
-- as T1ColumnList.  Otherwise, list the columns of Table2 in
-- the same order as the columns in table1 that you wish to compare.
--
-- The result is all records from either table that do NOT match
-- the other table, along with which table the record is from.

declare @SQL varchar(8000);

IF @t2ColumnList = '' SET @T2ColumnList = @T1ColumnList

set @SQL = 'SELECT ''' + @table1 + ''' AS TableName, ' + @t1ColumnList +
 ' FROM ' + @Table1 + ' UNION ALL SELECT ''' + @table2 + ''' As TableName, ' +
 @t2ColumnList + ' FROM ' + @Table2

set @SQL = 'SELECT Max(TableName) as TableName, ' + @t1ColumnList +
 ' FROM (' + @SQL + ') A GROUP BY ' + @t1ColumnList + 
 ' HAVING COUNT(*) = 1'

exec ( @SQL)
Joe Stefanelli