Update: I'm not looking for someone to come up with a schema for me. I guess I'm looking for input on whether some normalized DB design would work, when I can't clearly define all the relationships of the entities I'm working with, or if something more along the lines of a data warehouse would be something I wanted to look at further (realizing here I know enough about data warehouses to be dangerous - and that's about it.)
I've been tasked at work with 'streamlining' a reporting process for a small call center. Most of my background is in web application, and I'd consider myself an intermediate PHPer (self taught, no college - I'll take a moment of silence for the collective gasp to subside). So this was a bit if a different project from my norm - though there still needs to be a web-based interface, so it's at least a little like home.
The reporting process as it stands involves getting printed reports from an ACD system that needs to be manually entered into Crystal Reports. Additionally, Crystal is used to run reports from a ticketing system to find things like resolution rate for calls taken, etc. The task I've been given is to allow the uploading of electronic CSV files which should be parsed, then loaded into a database. Once loaded into said database, reports should be able to be generated and emailed just by clicking a link on a website (basically).
I usually start out projects by looking at the data I have, and building a database to model that data. I'm no DB rockstar, so the databases are usually pretty simple, but I do try to normalize to at lease 3NF. With this project though, I quickly saw that it would be difficult to reliably determine a relationship - since scripting that I have no control over determines how a lot of this data is related, and the reports I get are not conducive to sniffing out that relationship. So I started looking on the web. I've worn google out. I've read a bunch of questions and answers on SO; many of which have more acronyms in them than I care to look up.
So I come to you, SO, for help. Assuming I've given enough information, can anyone tell me if what I'm looking at would be best served by me running off and learning some more about data warehousing (also, if so, where, who, what should I be reading/doing), or would a pretty denormalized SQL database probably work?
Keep in mind that at most there will be about 300 - 400 rows of data entered a day - and most of that data are simple INTs. This is a very small database.
The business just wants to reduce the amount of manpower used to create the reports. They're not seeking to change the reports.
I hope I've given enough information, if not, I'll do my best to be more specific, based on comments/questions I receive back.
I started down the road of doing a 3NF schema, and ended up with several tables. One for Agents (id, name, email, extension), Agent Groups (id, group name) and Applications (id, application name).
It broke down a bit when I realized that when an Application receives a call, it can go to any number of groups based on any number of criteria, and I have no way of getting that information (I'm not sure anyone does). So I started to think that there wasn't a need to relate these 3 things together at all.
With that in mind, there were going to be 4 more tables, AgentProfiles, AgentEvaluations, AgentGroupSummary, and ApplicationSummary. Each of these tables would have a column that corresponds to the data in the report I'm getting. Additionally, there would be an FK that points back to the Application, Agent or Agent Group associated with this 'line' of data.
I think I started to panic at this point, and I guess I'm looking for some input from people outside the quagmire of this project on how to proceed. I don't want to denormalize to the point that future maintenance becomes a nightmare - I seem to be stuck; afraid I'll over/under design and screw myself in the long run.
Edit:
I resisted the urge to go too much in-depth about the data I was working with for fear of creating a giant wall of text. I'll explain the data that I'll be getting, but the CSV files are so malformed I can't really provide an example of the reports I'll be getting. I can (and do, a little further down) provide an example of the data that would be going into the DB.
Basically, the reports I get are measurements of a call analyst's stats. How many calls they take in an hour, in a day, how long it takes them to answer a call, the length of time they're talking, etc. In the reports, each analyst is called an Agent. Each Agent belongs to an Agent Group, and each Agent Group is associated with an Application.
Once I have the data into the DB, I'll need to make pretty reports that can be exported to management, and also to agents, on a daily basis.
There are 2 reports that deal specifically with Agents - an Agent Profile report, and an Agent Evaluation report. I'll give examples of one of the reports. The rest of the reports aren't exactly conducive to being distilled into text without 40 minutes of typing.
AGNTNAME,07:18:56,03:29:36,26,265,74,0,339,11
- Agent Evaluation Report
- Agent name
- Length of time (HH:MM:SS) that an agent was signed in to their phone
- Length of time (HH:MM:SS) that an agent was ready to take calls
- How many calls they took (INT)
- Then there are several calculated averages, these are calculated by whatever generates the reports. Unless otherwise noted, these are integers expressing a total time in seconds (think 180 vs 00:03:00)
- talk time (time a call starts until an agent disconnects)
- work time (time spent unavailable after taking a call, until available to take calls again)
- hold time (time a caller was on hold with a particular agent)
- handling time (talk time + work time)
- calls per hour (this is an imaginary #, meaning that it's extrapolated based on the amount of calls an agent takes against the amount of time they were logged in. I'm not sure what the formula is that is used to get this number)
The Agent Profile Report breaks an agents day down into distinct login in periods. Anytime an agent becomes unavailable, a new login period is generated, along with a new line of data structured similarly to the Agent Evaluation Report, but with more fields that measure more and more analytical points. Most of which are averages, or manufactured averages (meaning they aren't true averages of actual numbers, but averages of computed numbers based on some secret-sauce criterion).
Agents are also group into logical subsets based on skill, these are called Agent Groups. Each Agent Group belongs to 1 or more Applications. You can think of an Application as a call queue ("Press 1 for password resets", "Press 2 for Microsoft Office help"; etc). However, each Application has a script that determines how a call gets routed to up to 10 Agent Groups that are associated with an Application.
This is where determining relationships gets hairy, because there's nothing in the reports that tells me "call X was routed to Agent Group Y because of criteria Z". So I end up with 3 objects that are hard to relate together reliably.
- Agents
- Agent Groups
- Applications
An Agent belongs to 1 or more Agent Groups. An Agent belongs to 0 Applications (directly - they get associated through Agent Groups).
An Agent Group can have 1 or more Agents. An Agent Group belongs to 1 or more Applications.
An Application has between 1 and 10 Agent Groups. An Application has 0 Agents (again, directly).
Because I'll be required to keep historical data, I'll need a way to weed out stats for agents that no longer exist, so I'm not emailing stats to nonexistent email addresses.
Hope the extra information helps.