ansaurus

Question

Answer 1

A:

What do you mean by correlate? Do you just want to see if they're equal? You can do that in T-SQL by joining the table to itself:

select distinct
    case when a.OrderNumber < b.OrderNumber then a.OrderNumber 
        else b.OrderNumber 
        end as FirstOrderNumber,
    case when a.OrderNumber < b.OrderNumber then b.OrderNumber 
        else a.OrderNumber 
        end as SecondOrderNumber
from
    MyTable a
    inner join MyTable b on
        a.CustomerName = b.CustomerName
        and a.CustomerAddress = b.CustomerAddress
        and a.CustomerCode = b.CustomerCode

This would return you:

FirstOrderNumber  |  SecondOrderNumber
               1  |                  2

Eric 2009-06-04 16:16:05

Answer 2

A:

Correlation is defined on metric spaces, and your values are not metric.

This will give you percent of customers that don't have customerAddress uniquely defined by customerName:

SELECT  AVG(perfect)
FROM    (
        SELECT  customerName, CASE WHEN COUNT(customerAddress) = COUNT(DISTINCT customerAddress) THEN 0 ELSE 1 END AS perfect
        FROM    orders
        GROUP BY
                customerName
        ) q

Substitute other columns instead of customerAddress and customerName into this query to find discrepancies between them.

Quassnoi 2009-06-04 16:23:55

Answer 3

A:

There is a 'functional dependency' test built in to the SQL Server Data Profiling component (which is an SSIS component that ships with SQL Server 2008). It is described pretty well on this blog post:

http://blogs.conchango.com/jamiethomson/archive/2008/03/03/ssis-data-profiling-task-part-7-functional-dependency.aspx

I have played a little bit with accessing the data profiler output via some (under-documented) .NET APIs and it seems doable. However, since my requirement dealt with distribution of column values, I ended up going with something much simpler based on the output of DBCC STATISTICS. I was quite impressed by what I saw of the profiler component and the output viewer.

Paul Harrington 2009-06-05 01:17:18

Thanks dude...I knew I had seen something like this before, I just couldn't remember where.

Chris B. Behrens 2009-06-05 14:22:46

ansaurus

tags:

views:

answers:

Detecting Correlated Columns in Data

related questions