views:

40

answers:

3

I'm in the planning stages of building a SQL Server DataMart for mail/email/SMS contact info and history. Each piece of data is located in a different external system. Because of this, email addresses do not have account numbers and SMS phone numbers do not have email addresses, etc. In other words, there isn't a shared primary key. Some data overlaps, but there isn't much I can do except keep the most complete version when duplicates arise.

Is there a best practice for building a DataMart with this data? Would it be an acceptable practice to create a key table with a column for each external key? Then, a unique primary ID can be assigned to tie this to other DataMart tables.

Looking for ideas/suggestions on approaches I may not have yet thought of.

Thanks.

+1  A: 

The email address or phone number itself sounds like a suitable business key. Typically a "staging" database is used to load the data from multiple sources and then assign surrogate keys and do other transformations.

Are you familiar with data warehouse methods and design patterns? If you don't have previous knowledge or experience then consider hiring some help. BI / data warehouse projects have a very high failure rate and mistakes can be expensive.

dportas
Ah, surrogate keys is probably what I'm after. Thanks. I've reviewed star schemas, ETL processes, cubes, etc. I certainly appreciate the advice. As the project is smaller in scale in a test environment initially, I just don't think it would be worthwhile at this point in time to hire outside help. We already have on-site staff we'll likely pass this to once conceptually complete. We're in the planning stages right now.
MattB
A: 

Well, with no other information to tie the disparate pieces together, your datamart is going to be pretty rudimentary. You'll be able to get the types of data (sms, email, mail), metrics for each type over time ("this week/month/quarter/year we averaged 42.5 sms texts per day, and 8000 emails per month! w00t!"). With just phone numbers and email addresses, your "other datamarts" will likely have to be phone company names, or internet domains. I guess you could link from that into some sort of geographical information (internet provider locations?), or maybe financial information for the companies. Kind of a blur if you don't already know which direction you want to head.

To be honest, this sounds like someone high-up is having a knee-jerk reaction to the "datamart" buzzword coupled with hearing something about how important communication metrics are, so they sent orders on down the chain to "get us some datamarts to run stats on all our e-mails!"

You need to figure out what it is that you or your employer is expecting to get out of this project, and then figure out if the data you're currently collecting gives you a trail to follow to that information. Right now it sounds like you're doing it backwards ("I have this data, what's it good for?"). It's entirely possible that you don't currently have the data you need, which means you'll need to buy it (who knows if you could) or start collecting it, in which case you won't have nice looking graphs and trend-lines for upper-management to look at for some time... falling right in line with the warning dportas gave you in his second paragraph ;)

pheadbaq
Our goals are to make contact information, contact history, and behavior more accessible in a central location rather than use a jumbled-together Excel spreadsheet with data from multiple systems. We currently have no way of easily tracking contact history. Thus, we're looking to bring this data together in a central environment that is much more accessible to everyone. The issue is that data is fragmented and not easily accessible. That's what we're hoping to change. It's smaller in scale and will only be used by 3 people initially.
MattB
Streamline your data collection as much as possible. Incoming e-mails should be on an e-mail server somewhere, so you should be able to pull info out of it. Phone calls are probably logged in a PBX system, and if it's a digital system you should be able to get info out of it. SMS... not sure how you're going to log that info unless you've got a system sitting between the two people texting. If you do, it should have info that you could pull out. All three cases would require someone know how to get into each system's data, and figure out how to get it to the data mart.
pheadbaq
Each system may store a huge history of info, or just a month's worth, there's no way to know unless you know the system. So assuming it doesn't store a never-ending log, you have to figure out a schedule to regularly get the data out of each system. Then you have to have a suitable datamart design for that extracted data to go into, so that you can produce the metrics you're looking for. BTW If you currently don't record contact history, then you're talking about pushing a new app in front of your users, the trick is to get them to actually use it, correctly. "The Devil's In the Details"
pheadbaq
The good news is that we do have access to these things and know the systems we're dealing with. For contact information, I'm speaking more from a marketing campaign perspective. Being in the department I'm in, we do have access to that data. You're correct, though, that we'll need to ensure campaign recipient data is logged as part of our process. My question was mainly technical from a best practice standpoint when tying contact data together. Those are some of the details we want to really think through prior to putting it in front of IT if we go that route.
MattB
A: 

Found more information here:

http://en.wikipedia.org/wiki/Extract,_transform,_load#Dealing_with_keys

MattB