views:

52

answers:

3

I always struggle with dealing with normalised data, and how I display it. Maybe its because I don't fully understand the normalisation rules, like how to get it fully into Boyce-Codd. Performance is not really an issue at this stage, though maintainability of the schema is.

user

ID  Name
1   Alice
2   Bob
3   Charlie

skills

ID   Name
1    Karate
2    Marksmen
3    Cook

event

ID   Name
1    Island
2    Volcano

user-m2m-skill

MemberID  SkillID
1         1
1         2
2         1
2         3
3         1

user-m2m-event

MemberID  EventID
1         1
1         2
2         1
3         2

How do I get this information out of the database? I'd like to display a table like this, where I've got the total count of each skill:

Skills at event

Event    Karate Marksmen Cook
Island   2      1        1
Volcano  2      1        0

It is unlikely that the skills table will change very much. This means I could do a set of subqueries like this (obviously shortened and incorrect syntax)

SELECT event.name,
  (SELECT COUNT(*) FROM ... WHERE skill = 'Karate'), 
  (SELECT COUNT(*) FROM ... WHERE skill = 'Marksmen') FROM event

And that's what I've been doing, putting it into a view. But its a bit horrible, no? I have to edit the view every time I add a new skill.

The other way to to process it client side. So I just get back something like this:

Event   Skill    Count
Island  Karate   2
Island  Marksmen 1
Island  Cook     1
Volcano Karate   2
Volcano Marksmen 1

And I loop through the results, reformatting it. But I hate that even more. Isn't the database supposed to do data?

So: what am I doing wrong? Am I expecting too much? Which is the lesser evil?

(As b3ta would say, apologies for length of post and for bad markup. :( )

A: 

Your second example, with "Event", "Skill", and "Count" as headers, is what you should expect from dynamically generated results from normalized data. Databases are not designed to format data for display (this isn't an Excel spreadsheet), they're designed to store data and return the meaning of that data. It's up to your code to display it in a nice fashion.

Amber
Okay, gotcha. I suppose I'm just pathologically adverse to loops, always think - hey, I can do this in O(n)!But in a grand scheme, trying to get the database to do all the work is a false economy I suppose. I'll need to bite the bullet and create some nice classes to do this for me.
Martin
A: 

Who's b3ta?

As far as the database "doing data" that doesn't mean that client code will be free of all parsing and processing. And normalization in practice shouldn't b a goal in itself. You should also take into account ease of querying and performance.

Pierreten
b3ta.com, sorry. It's a... erm... I have no idea what it is. Community site?And you are very right - I always strive for the most maintainable code, so I don't want to go overkill on clever tricks.
Martin
+1  A: 

This is a typical pivot query, because you are looking to convert data in rows into columns.

   SELECT e.name,
          MAX(CASE WHEN x.skill_name = 'Karate' THEN x.num_skill ELSE 0) END AS Karate,
          MAX(CASE WHEN x.skill_name = 'Marksmen' THEN x.num_skill ELSE 0 END) AS Marksmen
     FROM EVENT e
LEFT JOIN (SELECT um.eventid,
                  s.name AS skill_name,
                  COUNT(*) 'num_skill'
             FROM SKILLS s
             JOIN USER-M2M-SKILL us ON us.skillid = s.id
             JOIN USER-M2M-EVENT um ON um.memberid = us.memberid
         GROUP BY um.eventid, s.name) x ON x.eventid = e.id
 GROUP BY e.name

Followup question:

...what does this have that a load of sub queries doesn't?

SELECTs as statements within the SELECT clause. IE:

SELECT x.name,
       (SELECT COUNT(*) FROM TABLE)

...means that a separate query is run for every skill. If the queries were correllated - if they were tied together by an ID to make sure records sync'd with an event, then the count would be running for every event.

Conclusion to Followup

The approach is terribly inefficient. It is better to fetch the necessary values once, as I provided in my answer.

Addendum

Regarding updating the query - it is possible to minimize the maintenance by implementing the query with dynamic SQL.

OMG Ponies
Showing off your l33t SQL skills? This made me jealous for some reason.
Pierreten
Okay, this looks cool. But what does this have that a load of sub queries doesn't? I'd guess it'd be much faster, but I'd still have to update this SQL each time I added a skill.
Martin
Excellent, thanks for the explanation. I think I'm gonna go this way, and create a stored procedure. I think that is the most maintainable approach.
Martin