My company is working with 3 partners and each partner can have multiple brands. Each week, I get a dump of each brand's user list which I store in a MySQL database with a table for each brand. Each brand contains a list of users and some basic information (birthyear, zip code, gender). Some users can be signed up with different brands and each brand can have it's own set of data on a user.
For example, a user is signed up with Canvas and MNM. At Canvas, their profile looks like this:
ID GENDER BIRTHYEAR POSTCODE MODIFIED 94bafdb3e155d30349f1113a25c0714f M 1973 2800 2009-01-01 09:01:01
and at MNM, like this:
ID GENDER BIRTHYEAR POSTCODE MODIFIED 94bafdb3e155d30349f1113a25c0714f 1973 1000 2009-09-09 09:01:01
I'd like to create a view (or table - I'm not sure which is best) that would combine the two records using the most recent version of the data, but also letting me know where the data came from.
So the above two records would combine to:
ID GENDER G_DATE G_BRAND BIRTHYEAR B_DATE B_BRAND POSTCODE P_DATE P_BRAND 94bafdb3e155d30349f1113a25c0714f M 2009-01-01 09:01:01 Canvas 1973 2009-09-09 09:01:01 MNM 1000 2009-09-09 09:01:01 MNM
I'm imagining some convoluted series of unions and sub queries, but I'm not even really sure where to begin.
I've created a view that merges all of the tables
CREATE VIEW view_combine AS SELECT ID, GENDER, MODIFIED as G_DATE, 'Canvas' as G_BRAND, BIRTHYEAR, MODIFIED as B_DATE, 'Canvas' as B_BRAND, POSTCODE, MODIFIED as P_DATE, 'Canvas' as P_BRAND FROM canvas UNION ALL SELECT ID, GENDER, MODIFIED as G_DATE, 'Een' as G_BRAND, BIRTHYEAR, MODIFIED as B_DATE, 'Een' as B_BRAND, POSTCODE, MODIFIED as P_DATE, 'Een' as P_BRAND FROM een UNION ALL SELECT ID, GENDER, MODIFIED as G_DATE, 'MNM' as G_BRAND, BIRTHYEAR, MODIFIED as B_DATE, 'MNM' as B_BRAND, POSTCODE, MODIFIED as P_DATE, 'MNM' as P_BRAND FROM mnm
and then I'm trying to perform selections on that, but I don't think it's the right direction.
SELECT v1.hashkey, ge.gender, ge.g_date, ge.g_brand,
bi.birthyear, bi.b_date, bi.b_brand,
pc.postcode, pc.p_date, pc.p_brand
FROM view1 v1
JOIN (
select g.hashkey, g.gender, g.g_date, g.g_brand
from view1 g
left join view1 g1 ON g.hashkey = g1.hashkey AND g.g_date < g1.g_date
WHERE g1.hashkey IS NULL
) ge ON ge.HASHKEY = v1.HASHKEY
JOIN (
select b.hashkey, b.birthyear, b.b_date, b.b_brand
from view1 b
left join view1 b1 ON b.hashkey = b1.hashkey AND b.b_date < b1.b_date
WHERE b1.hashkey IS NULL
) bi ON bi.HASHKEY = v1.HASHKEY
JOIN (
select p.hashkey, p.postcode, p.p_date, p.p_brand
from view1 p
left join view1 p1 ON p.hashkey = p1.hashkey AND p.p_date < p1.p_date
WHERE p1.hashkey IS NULL
) pc ON pc.HASHKEY = v1.HASHKEY
GROUP BY v1.hashkey