views:

387

answers:

5

I am trying to create a pivot table type view in postgresql and am nearly there! Here is the basic query:

select 
acc2tax_node.acc, tax_node.name, tax_node.rank 
from 
tax_node, acc2tax_node 
where 
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';

And the data:

   acc    |          name           |     rank     
----------+-------------------------+--------------
 AJ012531 | Paromalostomum fusculum | species
 AJ012531 | Paromalostomum          | genus
 AJ012531 | Macrostomidae           | family
 AJ012531 | Macrostomida            | order
 AJ012531 | Macrostomorpha          | no rank
 AJ012531 | Turbellaria             | class
 AJ012531 | Platyhelminthes         | phylum
 AJ012531 | Acoelomata              | no rank
 AJ012531 | Bilateria               | no rank
 AJ012531 | Eumetazoa               | no rank
 AJ012531 | Metazoa                 | kingdom
 AJ012531 | Fungi/Metazoa group     | no rank
 AJ012531 | Eukaryota               | superkingdom
 AJ012531 | cellular organisms      | no rank

What I am trying to get is the following:

acc      | species                  | phylum
AJ012531 | Paromalostomum fusculum  | Platyhelminthes

I am trying to do this with CASE WHEN, so I've got as far as the following:

select 
acc2tax_node.acc, 
CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as species, 
CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as phylum 
from 
tax_node, acc2tax_node 
where 
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531';

Which gives me the output:

   acc    |         species         |     phylum      
----------+-------------------------+-----------------
 AJ012531 | Paromalostomum fusculum | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | Platyhelminthes
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 
 AJ012531 |                         | 

Now I know that I have to group by acc at some point, so I try

select 
acc2tax_node.acc, 
CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END as sp, 
CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END as ph 
from 
tax_node, acc2tax_node 
where 
tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' 
group by acc2tax_node.acc;

But I get the dreaded

ERROR:  column "tax_node.rank" must appear in the GROUP BY clause or be used in an aggregate function

All the previous examples I've been able to find use something like SUM() around the CASE statements, so I guess that is the aggregate function. I have tried using FIRST():

select 
acc2tax_node.acc, 
FIRST(CASE tax_node.rank WHEN 'species' THEN tax_node.name ELSE NULL END) as sp, 
FIRST(CASE tax_node.rank WHEN 'phylum' THEN tax_node.name ELSE NULL END) as ph 
from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531' group by acc2tax_node.acc;

but get the error:

ERROR:  function first(character varying) does not exist

Can anyone offer any hints?

A: 
SELECT  atn.acc, ts.name AS species, tp.name AS phylum
FROM    acc2tax_node atn
LEFT JOIN
        tax_node ts
ON      ts.taxid = atn.taxid
        AND ts.rank = 'species'
LEFT JOIN
        tax_node tp
ON      tp.taxid = atn.taxid
        AND tp.rank = 'phylum'
WHERE   atn.acc = 'AJ012531 '
Quassnoi
This query gives the same result:` acc | species | phylum ``----------+-------------------------+-----------------`` AJ012531 | Paromalostomum fusculum | `` AJ012531 | | `` AJ012531 | | `` AJ012531 | | `` AJ012531 | | `` AJ012531 | | `` AJ012531 | | Platyhelminthes`` AJ012531 | | `` ...etc etc`
mojones
Please excuse my terrible attempts at formatting :-)
mojones
A: 

Further info as requested (in a reply rather than a comment for nice formatting):

SELECT * FROM acc2tax_node WHERE acc = 'AJ012531';

   acc    | taxid  
----------+--------
 AJ012531 |  66400
 AJ012531 |  66399
 AJ012531 |  39216
 AJ012531 |  39215
 AJ012531 | 166235
 AJ012531 | 166384
 AJ012531 |   6157
 AJ012531 |  33214
 AJ012531 |  33213
 AJ012531 |   6072
 AJ012531 |  33208
 AJ012531 |  33154
 AJ012531 |   2759
 AJ012531 | 131567
mojones
A: 

PostgreSQL does have a couple of functions for pivot queries, see this article at Postgresonline. You can find these functions in the contrib.

Frank Heikens
Yes, I suspect that the correct way to do this is with scrosstab. I'd still like to figure out what I'm doing wrong here for my own education.
mojones
+1  A: 

Use MAX() or MIN(), not FIRST(). In this scenario, you will have all NULLs in the column per each group value except for, at most, one with a not null value. By definition, this is both the MIN and the MAX of that set of values (all nulls are excluded).

Matthew Wood
That works perfectly, thank you. For some reason I assumed that MAX() would not work because I was using string values.
mojones
A: 

Execute:

SELECT report.* FROM crosstab( select acc2tax_node.acc, tax_node.name, tax_node.rank from tax_node, acc2tax_node where tax_node.taxid=acc2tax_node.taxid and acc2tax_node.acc='AJ012531'; ) AS report(species text, enus text, family text, ...)

The G Man