views:

42

answers:

3

Hi all

I have the following problem. I have a table with a few hundred thousand records, which has the following identifiers (for simplicity)

MemberID      SchemeName        BenefitID        BenefitAmount
10            ABC               1                10000
10            ABC               1                2000
10            ABC               2                5000
10            A.B.C             3                11000

What I need to do is to convert this into a single record that looks like this:

MemberID      SchemeName        B1       B2      B3        
10            ABC               12000    5000    11000     

The problem of course being that I need to differentiate by SchemeName, and for most records this won't be a problem, but for some SchemeName wouldn't be captured properly. Now, I don't particularly care if the converted table uses "ABC" or "A.B.C" as scheme name, as long as it just uses 1 of them.

I'd love hear your suggestions.

Thanks
Karl

(Using SQL Server 2008)

A: 

It looks that PIVOTS can help

lakhlaniprashant.blogspot.com
Pivots would work great, but I still need to deal with the SchemeName issue
Karl
+2  A: 

based on the limited info in the original question, give this a try:

DECLARE @YourTable  table(MemberID int, SchemeName varchar(10), BenefitID int, BenefitAmount int)

INSERT INTO @YourTable VALUES (10,'ABC'  ,1,10000)
INSERT INTO @YourTable VALUES (10,'ABC'  ,1,2000)
INSERT INTO @YourTable VALUES (10,'ABC'  ,2,5000)
INSERT INTO @YourTable VALUES (10,'A.B.C',3,11000)
INSERT INTO @YourTable VALUES (11,'ABC'  ,1,10000)
INSERT INTO @YourTable VALUES (11,'ABC'  ,1,2000)
INSERT INTO @YourTable VALUES (11,'ABC'  ,2,5000)
INSERT INTO @YourTable VALUES (11,'A.B.C',3,11000)
INSERT INTO @YourTable VALUES (10,'mnp',3,11000)
INSERT INTO @YourTable VALUES (11,'mnp'  ,1,10000)
INSERT INTO @YourTable VALUES (11,'mnp'  ,1,2000)
INSERT INTO @YourTable VALUES (11,'mnp'  ,2,5000)
INSERT INTO @YourTable VALUES (11,'mnp',3,11000)

SELECT
    MemberID, REPLACE(SchemeName,'.','') AS SchemeName
        ,SUM(CASE WHEN BenefitID=1 THEN BenefitAmount ELSE 0 END) AS B1
        ,SUM(CASE WHEN BenefitID=2 THEN BenefitAmount ELSE 0 END) AS B2
        ,SUM(CASE WHEN BenefitID=3 THEN BenefitAmount ELSE 0 END) AS B3
    FROM @YourTable
    GROUP BY MemberID, REPLACE(SchemeName,'.','')
    ORDER BY MemberID, REPLACE(SchemeName,'.','')

OUTPUT:

MemberID    SchemeName  B1          B2          B3
----------- ----------- ----------- ----------- -----------
10          ABC         12000       5000        11000
10          mnp         0           0           11000
11          ABC         12000       5000        11000
11          mnp         12000       5000        11000

(4 row(s) affected)
KM
Ok, that will solve the problem if the I just needed to remove the punctuation marks. But that was just an arbitrary example. What if I had "ABC" and "The ABC scheme bleh" etc.
Karl
@Karl, if you have such vastly different SchemeNames you are out of luck, you will need to manually cleanse the data. I'm just guessing, but possibly you need to change the application to present the user with a Select Box of valid values to pick from and not free text to enter what they want.
KM
You'd have to pick one. Replace `REPLACE(SchemeName, '.', '')` with `MIN(SchemeName)` for instance.
Lieven
this answer may help with how to join/group/match different free text values: http://stackoverflow.com/questions/2342717/sql-server-what-to-do-to-make-collation-key-from-a-string-value/2342865#2342865
KM
A: 

The schemename issue is something that will have to be dealt with manually since the names can be so different. This indicates first and foremost a problem with how you are allowing data entry. You should not have these duplicate schemenames.

However since you do, I think the best thing is to create cross reference table that has two columns, something like recordedscheme and controlling scheme. Select distinct scheme name to create a list of possible schemenames and insert into the first column. Go through the list and determine what the schemename you want to use for each one is (most willbe the same as the schemename). Once you have this done, you can join to this table to get the query. This will work for the current dataset, however, you need to fix whatever is causeing the schemename to get duplicated beofre going further. YOu will also want to fix it so when a schemename is added, you table is populated with the new schemename in both columns. Then if it later turns out that a new one is a duplicate, all you have to do is write a quick update to the second column showing which one it really is and boom you are done.

The alternative is to actually update the schemenames that are bad in the data set to the correct one. Depending on how many records you have to update and in how many tables, this might be a performance issue.This too is only good for querying the data right now and doesn't address how to fix the data going forth.

HLGEM