ansaurus

Question

Answer 1

A:

Keep in mind that you will be potentially destroying data here. Just because a row has fewer columns filled doesn't mean that it's less accurate in the columns that are filled.

I've assumed that duplicates are determined by a column called "name". You'll need to adjust based on your definition of duplicates. Also, since you didn't give any rules on how to deal with ties for "most populated" I just chose the row with the lowest id.

UPDATE
    T1
SET
    col_1 = T2.col_1,
    col_2 = T2.col_2,
    ....
FROM
    My_Table T1
INNER JOIN My_Table T2 ON
    T2.name = T1.name AND
    T2.id =
    (
        SELECT TOP 1
            T3.id
        FROM
            My_Table T3
        WHERE
            T3.name = T1.name
        ORDER BY
            CASE WHEN col_1 IS NOT NULL THEN 1 ELSE 0 END +
            CASE WHEN col_2 IS NOT NULL THEN 1 ELSE 0 END +
            ... DESC,
            id ASC
    )

EDIT: I just reread your question and you mention, "From there I can select distinct records and get a useful set of records." If that's what you really want, then don't bother updating the other rows, just select the ones that you want in the first place and leave everything else intact:

SELECT
    T1.id,
    T1.name,
    T1.col_1,
    T1.col_2,
    ...
FROM
    My_Table T1
WHERE
    T1.id =
    (
        SELECT TOP 1
            T2.id
        FROM
            My_Table T2
        WHERE
            T2.name = T1.name
        ORDER BY
            CASE WHEN T2.col_1 IS NOT NULL THEN 1 ELSE 0 END +
            CASE WHEN T2.col_2 IS NOT NULL THEN 1 ELSE 0 END +
            ... DESC,
            T2.id ASC
    )

Tom H. 2010-07-02 15:14:49

The problem here of course is that this may not filter out the right results, but I see the logic behind your thinking.

Wardy 2010-07-08 15:07:21

ansaurus

tags:

views:

answers:

How to select most populated record?

related questions