tags:

views:

65

answers:

2

My company is working with 3 partners and each partner can have multiple brands. Each week, I get a dump of each brand's user list which I store in a MySQL database with a table for each brand. Each brand contains a list of users and some basic information (birthyear, zip code, gender). Some users can be signed up with different brands and each brand can have it's own set of data on a user.

For example, a user is signed up with Canvas and MNM. At Canvas, their profile looks like this:

ID                                  GENDER  BIRTHYEAR   POSTCODE    MODIFIED
94bafdb3e155d30349f1113a25c0714f    M       1973        2800        2009-01-01 09:01:01

and at MNM, like this:

ID                                  GENDER  BIRTHYEAR   POSTCODE    MODIFIED
94bafdb3e155d30349f1113a25c0714f            1973        1000        2009-09-09 09:01:01

I'd like to create a view (or table - I'm not sure which is best) that would combine the two records using the most recent version of the data, but also letting me know where the data came from.

So the above two records would combine to:

ID                                  GENDER  G_DATE              G_BRAND BIRTHYEAR   B_DATE              B_BRAND POSTCODE   P_DATE               P_BRAND
94bafdb3e155d30349f1113a25c0714f    M       2009-01-01 09:01:01 Canvas  1973        2009-09-09 09:01:01 MNM     1000       2009-09-09 09:01:01  MNM

I'm imagining some convoluted series of unions and sub queries, but I'm not even really sure where to begin.

I've created a view that merges all of the tables

CREATE VIEW view_combine AS
SELECT ID, GENDER, MODIFIED as G_DATE, 'Canvas' as G_BRAND, 
    BIRTHYEAR, MODIFIED as B_DATE, 'Canvas' as B_BRAND, 
    POSTCODE, MODIFIED as P_DATE, 'Canvas' as P_BRAND FROM canvas
UNION ALL
SELECT ID, GENDER, MODIFIED as G_DATE, 'Een' as G_BRAND, 
    BIRTHYEAR, MODIFIED as B_DATE, 'Een' as B_BRAND, 
    POSTCODE, MODIFIED as P_DATE, 'Een' as P_BRAND FROM een
UNION ALL
SELECT ID, GENDER, MODIFIED as G_DATE, 'MNM' as G_BRAND, 
    BIRTHYEAR, MODIFIED as B_DATE, 'MNM' as B_BRAND, 
    POSTCODE, MODIFIED as P_DATE, 'MNM' as P_BRAND FROM mnm

and then I'm trying to perform selections on that, but I don't think it's the right direction.

SELECT v1.hashkey, ge.gender, ge.g_date, ge.g_brand, 
    bi.birthyear, bi.b_date, bi.b_brand, 
    pc.postcode, pc.p_date, pc.p_brand
FROM view1 v1
JOIN ( 
    select g.hashkey, g.gender, g.g_date, g.g_brand 
    from view1 g 
    left join view1 g1 ON g.hashkey = g1.hashkey AND g.g_date < g1.g_date 
    WHERE g1.hashkey IS NULL
) ge ON ge.HASHKEY = v1.HASHKEY
JOIN ( 
    select b.hashkey, b.birthyear, b.b_date, b.b_brand 
    from view1 b 
    left join view1 b1 ON b.hashkey = b1.hashkey AND b.b_date < b1.b_date 
    WHERE b1.hashkey IS NULL
) bi ON bi.HASHKEY = v1.HASHKEY
JOIN ( 
    select p.hashkey, p.postcode, p.p_date, p.p_brand 
    from view1 p 
    left join view1 p1 ON p.hashkey = p1.hashkey AND p.p_date < p1.p_date 
    WHERE p1.hashkey IS NULL
) pc ON pc.HASHKEY = v1.HASHKEY
GROUP BY v1.hashkey
+1  A: 

I've managed to solve this. Essentially, I needed to select on the view and then sub-select on the view to get the fields I wanted. I found that ordering on the date within the sub-select returned the values I needed.

SELECT v1.hashkey, ge.gender, ge.g_date, ge.g_brand, 
    bi.birthyear, bi.b_date, bi.b_brand, 
    pc.postcode, pc.p_date, pc.p_brand
FROM view_combine v1
JOIN ( 
    select g.hashkey, g.gender, g.g_date, g.g_brand 
    from view_combine g 
    left join view_combine g1 ON g.hashkey = g1.hashkey AND g.g_date < g1.g_date and g1.gender is not null
    WHERE g1.hashkey IS NULL
    order by g.g_date
) ge ON ge.HASHKEY = v1.HASHKEY
JOIN ( 
    select b.hashkey, b.birthyear, b.b_date, b.b_brand 
    from view_combine b 
    left join view_combine b1 ON b.hashkey = b1.hashkey AND b.b_date < b1.b_date and b1.birthyear is not null
    WHERE b1.hashkey IS NULL
    order by b.b_date
) bi ON bi.HASHKEY = v1.HASHKEY
JOIN ( 
    select p.hashkey, p.postcode, p.p_date, p.p_brand 
    from view_combine p 
    left join view_combine p1 ON p.hashkey = p1.hashkey AND p.p_date < p1.p_date and p1.postcode is not null
    WHERE p1.hashkey IS NULL
    order by p.p_date
) pc ON pc.HASHKEY = v1.HASHKEY
GROUP BY v1.hashkey
cabuki
+1  A: 

I realize you've solved already, but as a secondary viewpoint, this is something that I would pre-process.

Given the data:
Partner 1 - UserA, Male, Null, 6300, 9/9/09
Partner 2 - UserA, Null, 1980, 2300, 9/10/09

When querying for UserA, you would most likely want a "Most Current Record":
UserA, Male, 1980, 2300

Using the following tables:

Partner

TypeCode
DisplayName

CurrentUser

UserId
Gender
GenderSourcePartner
BirthYear
BirthYearSourcePartner
PostalCode
PostalCodeSourcePartner

PartnerSourceData

PartnerTypeCode
UserId
Gender
BirthYear
PostalCode
ModifiedDate

Then, when I receive the partner source files, I'd process it line by line to update the current user table and append to the PartnerSourceData table (using it as a log.)

Jacob G
I appreciate the 2nd opinion. I know my solution is probably overcomplicated and I'll definitely look into yours. It seems like it could be a lot easier to manage.
cabuki