tags:

views:

102

answers:

3

I have a sparsely populated matrix that's the result of a series of left joins. I'd like to collapse it down to a single row (see below). The only solution I've seen to this is a GROUP BY on the PK and a MAX() on ColA, ColB etc. Performance is a huge issue here, so I'd like to know if anyone has a better solution. The columns ColA, ColB etc are rows that have been pivoted to columns. My understanding is that I can't use a PIVOT because the columns are coming from rows (1 to n) and could change at any given time.

Join that produces the SPM (no, my tables/cols aren't really named that):

    SELECT    
   mainTable.custNbr  
   , mainTable.custPartNbr       
 , [lkup colA].usr_def_attr as [colA]
 , [lkup colB].usr_def_attr as [colB]
 , [lkup colC].usr_def_attr as [colC]
 , [lkup colD].usr_def_attr as [colD]
 , [lkup colE].usr_def_attr as [colE]
  FROM db2.dbo.table2 as mainTable     
 LEFT JOIN db1.dbo.lookup as [colA]
  ON mainTable.lookupValue = [colA].lkup_id   
  and mainTable.cmply_typ_lkup_id = 166697 
 LEFT JOIN db1.dbo.lookup as [lkup colB]
  ON mainTable.lookupValue = [lkup colB].lkup_id   
  and mainTable.cmply_typ_lkup_id = 166700 
 LEFT JOIN db1.dbo.lookup as [lkup colC]
  ON mainTable.lookupValue = [lkup colC].lkup_id   
  and mainTable.cmply_typ_lkup_id = 166699 
 LEFT JOIN db1.dbo.lookup as [lkup colD]
  ON mainTable.lookupValue = [lkup colD].lkup_id   
  and mainTable.cmply_typ_lkup_id = 166696 
 LEFT JOIN db1.dbo.lookup as [lkup colE]
  ON mainTable.lookupValue = [lkup colE].lkup_id   
  and mainTable.cmply_typ_lkup_id = 166698

Result:

PKCol   ColA ColB ColC ColD ColE
204045  NULL NULL NULL NULL 23
204045  NULL NULL NULL 35 NULL
204045  NULL NULL 35      NULL NULL
204045  NULL 23 NULL NULL NULL
204045  23 NULL NULL NULL NULL

Desired result:

PKCol   ColA ColB ColC ColD ColE
20405   23 23 35      35 23
A: 

Sparse matricies are common in linear algebra, numerical methods, and modeling of physics problems. If a matrix is diagonal, it's easy to represent it using a single vector. A bandwidth approach might require storing both row and column indicies for a non-zero element.

Googling for "sparse matrix storage" brought back lots of hits, including this. Maybe it can stimulate some ideas.

duffymo
I didn't downvote you. I think if people are gonna downvote they should *have a better answer*!
jcollum
No worries. Thanks for the feedback.
duffymo
+2  A: 

The GROUPY BY + MAX solution isn't a bad one. Since it's going to be scanning over the same number of records whether or not you're doing aggregates.

I'd be curious to know what the time difference with and without grouping is.

Allain Lalonde
+1. I misunderstood the question at first, but now that I got it figured out, I believe it's hard to beat this solution (unless, perhaps, if you write a procedure to do it).
Pukku
By a couple of milliseconds (compilation) you're correct.
Allain Lalonde
A: 

I rewrote this as a PIVOT and saw a performance increase of about 30%. It wasn't easy to do, had to read this post very carefully. PIVOTs are weird.

jcollum