views:

171

answers:

2

Hello Everybody!!

I have a large table(60 columns, 1.5 million records) of denormalized data in MS SQL 2005 that was imported from an Access database. I've been tasked with normalizing and inserting this data into our data model.

I would like to create a query that used a grouping of, for example "customer_number", and returned a result set that only contains the columns that are non distinct for each customer_number. I don't know if it's even possible, but it would be of great help if it was.

Edit:if my table has 3 columns(cust_num,cust_name_cust_address) and 5 records

|cust_num|cust_name|cust_address
|01      |abc      |12 1st street
|02      |cbs      |1 Aroundthe Way
|01      |abc      |MLK BLVD
|03      |DMC      |Hollis Queens
|02      |cbs      |1 Aroundthe Way

the results from my desired query should just be the data from cust_num and cust_name because cust_address has different values for that grouping of cust_num. A cust_num has many addresses but only one cust_name.

Can someone point me in the right direction?

Jim

+2  A: 
WITH    q AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, … ORDER BY id) AS rn
        FROM    mytable
        )
SELECT  *
FROM    q
WHERE   rn = 2

This will select only the rows that have duplicates on all columns listed in the PARTITION BY clause (in fact, it will select the first duplicate if any).

Quassnoi
this is what I asked for but it doesn't solve my problem. I will edit my original quetion. I'd like to group my table by customer number and get all the column back that duplicate values for the grouped customer number. For example if only the customer_name was the same in very_large_table, I'd get a return set with only customer_number and customer_name, but if customer_address was the same for each customer, I'd get back all 3 columns.
jim
A: 

No, this can't be done

jim