tags:

views:

253

answers:

2

I have a table (x) which has 9 million rows and one of its columns is cosub which has repeating duplicate values, there is another table (y) which has related details for cosub it has 59k rows and additional columns that provide details related to cosub (acting as lookup table) how can i join the two tables, querying 9 million rows and selecting from table (y) additional details of cosub. example:

Table x      table y 
id cosub    cosub div 

1   A        B     6
2   B        A     5
3   C        C     7
4   A        A     5
5   B        B     6
6   C        A     5
.....................

the result of the query should look like this (selecting all 9 million rows from table x)

1 A 5
2 B 6
3 C 7
4 A 5
5 B 6
6 C 7

+1  A: 
SELECT DISTINCT X.id, X.cosub, Y.div
FROM X
LEFT OUTER JOIN Y ON Y.cosub = X.cosub
-- WHERE  xxxx here for optional where condition
-- ORDER BY xxxx  here for optional ordering clause

I'm not 100% sure you need DISTINCT (it would be good to avoid it), it depends if the small table has duplicates. The text of the question seems to imply no such duplicates, but then the example give shows dups...

Also beware that in case the table Y has multiple div values for a given cosub (i.e. several records with conflicting div values for a given cosub), the above query would result in showing several rows in the results list, one per different value (but repeating the data from table X).

Finally, the snippet proposed used and LEFT OUTER JOIN, which would allow the result to include records with just the data from table X (and null values in lieu of the fields values normally coming from Y), in case a given X record has a cosub value not found in Y. The alternative is to use simply JOIN, which will have the effect of EXCLUDING any such record from the results list (i.e. the result would then only includes records from X provided they have a cosub that exists in Y)

mjv
we dont need distinct for table X as the value id is unique/primary key only cosub has repeating duplicates.table y has duplicates in cosub so i will have to use distinct for it.
@nazer555 I suggest you remove these duplicates in Y before proceeding, allowing you to avoid the "DISTINCT" in the query which could significantly slow things down.
mjv
all the values found in cosub of X has a matching value in the cosub column of table Y. every instance of cosub value has different div values and that makes it very difficult to join
What criteria would you like to join on then ? Would you like to have as many rows as there are distinct values for a given cosub ? would you like to take the smallest value of div [for a given cosub]? The biggest one? Or maybe their average? Would you like the first n possible values of div? All of this is possible, you just need define what the requirement should be. The snippet given would produce the output shown in the example, but maybe this examples fails to show cases of cosubs with multiple div values, and the desired output.
mjv
i'll be getting back to the client to clarify this tomorrow.
+1  A: 

Use:

   SELECT DISTINCT
          x.id,  
          x.cosub,
          y.div
     FROM TABLE_x x
LEFT JOIN (SELECT t.cosub,
                  t.div,
                  --other columns
             FROM TABLE_Y t) y ON y.cosub = x.cosub

From the comments I've read, you need to pre-process the records in TABLE_Y to get the correct div/etc values before returning a resultset.

OMG Ponies
This is almost right, but you need to add a 'distinct' to the second inner select.
Jon Wilson
Updated to add `DISTINCT`
OMG Ponies