views:

86

answers:

4

This is a simple and common scenario at work, and I'd appreciate some input.

Say I am generating a report for the owners of a pet show, and they want to know which of their customers have bought how many of each pet. In this scenario my only tools are SQL and something that outputs my query to a spreadsheet.

As the shop owner, I might expect reports in the form:

Customer  Dog  Cat  Rabbit
1         2    3    0
2         0    1    1
3         1    2    0
4         0    0    1

And if one day I decided to stock Goldfish then the report should now come out as.

Customer  Dog  Cat  Rabbit Goldfish
1         2    3    0      0
2         0    1    1      0
3         1    2    0      0
4         0    0    1      0
5         0    0    0      1

But as you probably know, to have a query which works this way would involve some form of dynamic code generation and would be harder to do.

The simplest query would work along the lines of: Cross join Customers and Pets, Outer join Sales, Group, etc. and generate:

Customer Pet      Quantity
1        Dog      2
1        Cat      3
1        Rabbit   0
1        Goldfish 0
2        Dog      0
2        Cat      1
2        Rabbit   1
...etc

a) How would I explain to the shop owners that the report they want is 'harder' to generate? I'm not trying to say it's harder to read, but it is harder to write.

b) What is the name of the concept I am trying to explain to the customer (to aid with my Googling)?

+2  A: 

The name of the concept is 'cross-tab' and can be accomplished in several ways.

MS Access has proprietary extensions to SQL to make this happen. SQL pre-2k5 has a CASE trick and 2k5 and later has PIVOT, but I think you still need to know what the columns will be.

n8wrl
Yep, the CASE trick will work for things like Quarters, but not for something like this.
nearly_lunchtime
Why not? Seems like 'quarters' and 'dog/cat/rabbit' are the same thing.
n8wrl
quarters are always four. animals sold aren't always three
Vinko Vrsalovic
Well, it will work, as long as I keep modifying the query. The definition of Quarters won't change, but my shop could decide to stock a new kind of pet, in which case I have to go and modify the case query and add another column
nearly_lunchtime
+1  A: 

Some databases indeed support some way of creating cross tables, but I think most need to know the columns in advance, so you'd have to modify the SQL (and get a database that supports such an extension).

Another alternative is to create a program that will postprocess the second "easy" table to get your clients the cross table as output. This is probably easier and more generic than having to modify SQL or dynamically generate it.

And about a way to explain the problem... you could show them in an Excel how many steps are needed to get the desired result:

  • Source data (your second listing).
  • Select values from the pets column
  • Place each pet type found on a new column
  • Count values per each type per client
  • Fill the values

and then say that SQL gives you only the source data, so it's of course more work.

Vinko Vrsalovic
+1  A: 

This concept is called pivoting

SQL assumes that your data is represented in terms of relations with fixed structure.

Like, equality is a binary relation, "customer has this many pets of this type" is a ternary relation and so on.

When you see this resultset:

Customer Pet      Quantity
1        Dog      2
1        Cat      3
1        Rabbit   0
1        Goldfish 0
2        Dog      0
2        Cat      1
2        Rabbit   1

, it's actually a relation defined by all possible combinations of domain values being in this relation.

Like, a customer 1 (domain customers id's) has exactly 2 (domain positive numbers) pets of genus dog (domain pets).

We don't see rows like these in the resultset:

Customer Pet      Quantity
1        Dog      3
Pete     Wife     0.67

, because the first row is false (customer 1 doesn't have 3 items of dog, but 2), and the second row values are out of their domain scopes.

SQL paradigma implies that your relations are defined when you issue a query and each row returned defines the relation completely.

SQL Server 2005+ can map rows into columns (that is what you want), but you should know the number of columns when designing the query (not running).

As a rule, the reports you are trying to build are built with reporting software which knows how to translate relational SQL resultsets into nice looking human readable reports.

Quassnoi
Isn't pivoting the SQL Server specific term, cross tables being the generic one?
Vinko Vrsalovic
+1  A: 

I have always called this pivoting, but that may not be the formal name.

Whatever it's called you can do almost all of this in plain SQL.

SELECT customer, count(*), sum(CASE WHEN pet='dog' THEN 1 ELSE 0 END) as dog, sum(case WHEN pet='cat' THEN 1 ELSE 0 END) as cast FROM customers join pets

Obviously what's missing is the dynamic columns. I don't know if this is possible in straight SQL, but it's certainly possible in a stored procedure to generate the query dynamically after first querying for a list of pets. The query is built into a string then that string is used to create a prepared statement.

AngerClown