tags:

views:

55

answers:

4

I've got a set of data that looks something like this (VERY simplified):

productId    Qty   dateOrdered
---------    ---   -----------
       1       2    10/10/2008
       1       1    11/10/2008
       1       2    10/10/2009
       2       3    10/12/2009
       1       1    10/15/2009
       2       2    11/15/2009

Out of this, we're trying to create a query to get something like:

productId  Year  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
---------  ----  --- --- --- --- --- --- --- --- --- --- --- ---
        1  2008    0   0   0   0   0   0   0   0   0   2   1   0
        1  2009    0   0   0   0   0   0   0   0   0   3   0   0
        2  2009    0   0   0   0   0   0   0   0   0   3   2   0

The way I'm doing this now, I'm doing 12 selects, one for each month, and putting those in temp tables. I then do a giant join. Everything works, but this guy is dog slow.

I know this isn't much to go on, but knowing that I barely qualify as a tyro in the db world, I'm wondering if there is a better high level approach to this that I might try. (I'm guessing there is.)

(I'm using MS Sql Server, so answers that are specific to that DB are fine.)

(I'm just starting to look at "PIVOT" as a possible help, but I don't know anything about it yet, so if someone wants to comment about that, that might be helpful as well.)

A: 

That's a tough one ...

One of the built-in functions to convert rows/data to columns is the PIVOT function in MS SQL server.

I don't have the specifics, this is just from the top of my head.

HaukurHaf
+2  A: 

You may want to check the following article:

Daniel Vassallo
Good link, plus good early advice in the link to try to push this kind of processing out to the data consumer. I'll try investigating that as an option as well.
Beska
+1  A: 

You can use either a Union of your queries rather than temp tables or use the pivot option.

Here's a forum discussion on it:

Sql Server Forums - Show the row-wise data as column-wise

Turnkey
UNION...an obvious idea that I somehow missed. Not sure if that would help, but it might be worth a shot.
Beska
Yes, it should be more efficient than the temps method. Just set 0 as the value for the select column for the months not participating in the month being queried in each query.
Turnkey
A: 

This qualifies as a presentation concern.
Presentation and SQL don't always mix well.

Isolating your presentation logic in the application layer will:

  1. save you maintenance time—change your application code, but keep your SQL intact;
  2. enable you to more quickly adapt to ephemeral client requirements;
  3. give you more satisfaction than fiddling with a cross-tab or pivot-table that maddeningly does almost exactly what you want.

Below is an example of how you might do this in Python (you can use the excellent pyodbc module to connect to SQL Server):

from collections import defaultdict
from datetime import date

dd = defaultdict(int)

# input 
rows = [(1,2,date(2008,10,10)), (1,1,date(2008,11,10)), 
        (1,2,date(2009,10,10)), (2,3,date(2009,10,12)), 
        (1,1,date(2009,10,15)), (2,2,date(2009,11,15))]

for row in rows:
    # row[0] == productId
    # row[1] == Qty
    # row[2] == dateOrdered
    # pyodbc enables referring to column names by name
    dd[(row[2].year, row[2].month, row[0])] += row[1]

presentation_rows = sorted(set((i[0], i[2]) for i in dd.keys()))

for i in presentation_rows:
  print i[1], i[0], 
  for j in range(0,13):
    try:
      print dd[i[0], j, i[1]], 
    except IndexError:
      print 0,
  print

# output
# 1 2008 0 0 0 0 0 0 0 0 0 0 2 1 0
# 1 2009 0 0 0 0 0 0 0 0 0 0 3 0 0
# 2 2009 0 0 0 0 0 0 0 0 0 0 3 2 0
Adam Bernier