tags:

views:

74

answers:

3

Hi, here we are comparing gene table records like: first will take gene a and compare this with all genes a,b,c for example aa,ab,ac same will take b then ba,bb,bc so on.......

so here matching results are a nd b is 2 because matching records are 589,822 common gene terms for b c count is 1 because matching record 586 and for all other combinations it should be zero.

goterm  gene auto
--------------------
589     a    1
822     a    2
478     a    3
586     b    4
589     b    5
600     c    6
586     c    7
822     b    8   

Query:

select count(*), 
       x.gene,
       x.ng 
 from (select t.gene,
              v.gene as ng 
        from (select distinct gene 
                from gene) as t 
  cross join (select distinct gene from gene) as v) as x 
   left join (select (g.gene),(n.gene) as ng from gene g 
        join gene n on n.goterm=g.goterm where g.auto<n.auto ) as y on y.gene = x.ng 
                                                                   and y.ng = x.gene
group by x.gene,x.ng

Finally the output of the above query is:

count  gene    gene
1      a       a
2      b       a
1      c       a
1      a       b
1      b       b
1      c       b
1      a       c
1      b       c
1      c       c

But the output must be:

count  gene    gene
0      a       a
2      b       a  
0      c       a
0      a       b
0      b       b
1      c       b
0      a       c
0      b       c
0      c       c
A: 

Not sure how exactly your logic works (especially the compare on auto), but this query produces the result you're looking for:

declare @t table (goterm  int, gene char(1), auto int identity)
insert @t
          select 589,     'a'
union select all 822,     'a'
union select all 478,     'a'
union select all 586,     'b'
union select all 589,     'b'
union select all 600,     'c'
union select all 586,     'c'
union select all 822,     'b'

select  t1.gene
,       t2.gene
,       (
        select  COUNT(*)
        from    @t t3
        join    @t t4
        on      t3.goterm = t4.goterm
                and t3.auto > t4.auto
        where   t3.gene = t1.gene
                and t4.gene = t2.gene
        ) as Total
from    (
        select  distinct gene 
        from    @t
        ) t1
cross join
        (
        select  distinct gene 
        from    @t
        ) t2
order by
        t2.gene
,       t1.gene

This prints:

gene    gene    Total
a       a       0
b       a       2
c       a       0
a       b       0
b       b       0
c       b       1
a       c       0
b       c       0
c       c       0

The t1 and t2 are there to create a matrix of gene combinations. The Total query looks up the number of hits for that combination. The t3.auto > t4.auto condition confuses me, so you might want to double check that.

Andomar
Having `auto` in the join like that prevents the 586 b,c pair from being counted as c,b as well.
Simon
@Simon: You mean like in the desired answer in the question?
Andomar
Oh, I had them backwards. Still, the point of it is to prevent that pair from being counted twice.
Simon
A: 
select combo1.gene, combo2.gene, count (gene2.goterm)
from (select distinct gene from gene) combo1
cross join (select distinct gene from gene) combo2
join gene gene1 on combo1.gene = gene1.gene
left join gene gene2 on combo2.gene = gene2.gene
    and gene1.goterm = gene2.goterm and gene1.auto < gene2.auto
group by combo1.gene, combo2.gene
order by combo1.gene, combo2.gene

Edit: realized I didn't need DISTINCT after all.

Simon
thank you so much....this worked
A: 

Try changing count(*) to count(y.gene) in your query.

Mark Bannister