tags:

views:

490

answers:

3

I have a tricky situation in trying to get information from multiple queries into a single row.

Consider the following table:

CpuUage:
    Time    time
    Group   char(10)
    Subsys  char(4)
    Jobs    int
    Cpu     int

holding the following data:

Time   Group   Subsys  Jobs  Cpu
-----  ------  ------  ----  ---
00:00  group1  NORM       1  101 (grp1-norm) A1
01:00  group1  SYS7       3  102 (grp1-sys7) A2
01:00  group1  NORM       5  104 (grp1-norm) A1
02:00  group1  NORM       7  108 (grp1-norm) A1
02:00  group2  NORM       9  116 (grp2-norm) B1
02:00  group3  SYS7      11  132 (grp3-sys7) C2
03:00  group1  SYS7      13  164 (grp1-sys7) A2
03:00  group1  IGNR      99  228 (grp1-ignr) --

The markers on the right (e.g., A1) are the sections in the report below that each row is supposed to affect.

I need a query that can return a single row for each user group but with one proviso. The values for Jobs and Cpu have to go into different columns depending on the subsystem ID and I'm only interested in the SYS7 and NORM subsystem IDs.

So, for example, we need the following (the A/B/1/2 bits are a cross-reference back to the rows above):

           <------ 1 ------>  <------ 2 ------>
   Group   NormJobs  NormCpu  Sys7Jobs  Sys7Cpu
   ------  --------  -------  --------  -------
A: group1        13      313        16      266
B: group2         9      116         0        0
C: group3         0        0        11      164

Our old reporting solution could run multiple queries (with a union all), then post-process the rows to combine those with the same group name, so that:

Group   NormJobs  NormCpu  Sys7Jobs  Sys7Cpu
------  --------  -------  --------  -------
group1        13      313         0        0
group1         0        0        16      266

were merged together, along the lines of:

select groupname, sum(jobs), sum(cpu), 0, 0 from tbl
    where subsys = 'NORM'
    group by groupname
union all
select groupname, 0, 0, sum(jobs), sum(cpu) from tbl
    where subsys = 'SYS7'
    group by groupname

Unfortunately, our new solution does not allow post-processing and it all has to be done in the SQL query.

Keeping in mind that there may be groups with SYS7 rows, NORM rows, both or neither, what's the best way to achieve this?

I thought about sub-querying the table from within an outer select but that may have performance ramifications.

In addition, that would be a pain since I'd have to make the outer query include NORM and SYS7 subsystems then run a subquery for every field (I can't make the outer query just for NORM jobs since the presence of a group with only SYS7 rows wouldn't be caught that way).

Can you bods weave any of your left-middle-inner-offside-join magic to come up with a viable solution?

I'd prefer a vendor-agnostic solution but, if you must go vendor-specific, the platform is DB2. Still, other platforms may at least give me an ideas of what to try so I'm happy to see them.

+3  A: 

I don't understand the problem with sub-querying, it seems like it should be just as fast:

select
    sub.gn as groupname,
    sum(sub.nj) as NormJobs, sum(sun.nc) as NormCpu,
    sum(sub.sj) as Sys7Jobs, sum(sub.sc) as Sys7Cpu
  from (
      select
          groupname as gn,
          sum(jobs) as nj, sum(cpu) as nc,
          0 as sj, 0 as sc
        from tbl
          where subsys = 'NORM'
          group by groupname
        union all select
            groupname as gn,
            0 as nj, 0 as nc,
            sum(jobs) as sj, sum(cpu) as sc
          from tbl
          where subsys = 'SYS7'
          group by groupname
    ) as sub
    group by sub.gn
    order by 1
BlueRaja - Danny Pflughoeft
I was concerned about having to do a subquery for every group in the main query, but your solution seems to have sidestepped this. It looks like it would only be the two subqueries regardless of how many groups there were. Thanks for your input, I'll definitely give this a shot.
paxdiablo
+1 and accept. This one did the trick with only two swipes at the database and is fast enough for my needs. I've edited it though to match DB2's mote stringent requirements and to modify the union a little.
paxdiablo
+4  A: 

It's a typical pivot query - here's how you'd do it with CASE statements:

  SELECT t.group,
         SUM(CASE 
           WHEN t.subsys = 'NORM' THEN t.jobs 
           ELSE NULL
         END CASE) AS NormJobs,
         SUM(CASE 
           WHEN t.subsys = 'NORM' THEN t.cpu
           ELSE NULL
         END CASE) AS NormCpu,
         SUM(CASE 
           WHEN t.subsys = 'SYS7' THEN t.jobs 
           ELSE NULL
         END CASE) AS Sys7Jobs,
         SUM(CASE 
           WHEN t.subsys = 'SYS7' THEN t.cpu
           ELSE NULL
         END CASE) AS Sys7Cpu
    FROM CPUUSAGE t
GROUP BY t.group

Unfortunately, DB2's CASE statements need to end with END CASE, when Oracle/SQL Server/MySQL/Postgres doesn't. Well, PLSQL supports END CASE...

There's also the PIVOT syntax, which is also supported on Oracle 11g, and SQL Server 2005+.

OMG Ponies
This is definitely a better solution than mine; I was not aware of this 'pivot,' thank you for sharing!
BlueRaja - Danny Pflughoeft
Your SUM should encapsulate the entire CASE because subsys is not part of the GROUP. See http://stackoverflow.com/questions/1997519/how-do-i-combine-two-queries-union-all-into-one-row/1997581#1997581
Craig Young
@Craig: Corrected, thanx.
OMG Ponies
I'm just a little worried about all that per-row-function stuff - I'll need to benchmark it but it seems to me that doing a case on every row is likely to drag down performance. Let me go away and check it out. Thanks.
paxdiablo
+1 for helping me out but @BlueRaja's solution worked fine with only two trips to the DB so I didn't need to test the other solutions. As with @CraigY's answer, I still want to test this out to figure out whether per-row functions can perform as well as a couple of subqueries. Thanks for your help.
paxdiablo
+3  A: 

This is a pivot table query. (Search on that if you require further info.)

The query structure you want is something along the lines of the following:

SELECT  groupname,
        SUM(CASE WHEN subsys = 'NORM' THEN jobs ELSE 0 END) AS NormJobs,
        SUM(CASE WHEN subsys = 'NORM' THEN cpu ELSE 0 END) AS NormCpu,
        SUM(CASE WHEN subsys = 'SYS7' THEN jobs ELSE 0 END) AS Sys7Jobs,
        SUM(CASE WHEN subsys = 'SYS7' THEN cpu ELSE 0 END) AS Sys7Cpu,
        SUM(CASE WHEN subsys NOT IN ('NORM', 'SYS7') THEN jobs ELSE 0 END) AS OtherJobs,
        SUM(CASE WHEN subsys NOT IN ('NORM', 'SYS7') THEN cpu ELSE 0 END) AS OtherCpu
FROM    ???
GROUP BY groupname
Craig Young
+1: For correct SUM CASE format.
OMG Ponies
As with @OMG's answer, I'm worried about all that per-row-function stuff. I'll have to check it out. I don't think the last two sums would be necessary since I don't care about "OtherXxx" information, just the NORM and SYS7. Thanks, I need to go do some performance testing.
paxdiablo
+1 for helping me out but @BlueRaja's solution worked fine with only two swipes at the DB so I didn't need to test the other solutions. I will get around to testing it shortly since I want to find out once and for all whether these per-row functions are actually fast enough. Thanks for your help.
paxdiablo