views:

555

answers:

6

I'm creating a really complex dynamic sql, it's got to return one row per user, but now I have to join against a one to many table. I do an outer join to make sure I get at least one row back (and can check for null to see if there's data in that table) but I have to make sure I only get one row back from this outer join part if there's multiple rows in this second table for this user. So far I've come up with this: (sybase)

select a.user_id 
from table1 a, table2 b 
where a.user_id = b.user_id 
and a.sub_id = (select min(c.sub_id) 
                from table2 c 
                where b.sub_id = c.sub_id)

The subquery finds the min value in the one to many table for that particular user.

This works but I fear nastiness from doing correlated subqueries when table 1 and 2 get very large. Is there a better way? I'm trying to dream up a way to get joins to do it, but I'm not seeing it. Also saying "where rowcount=1" or "top 1" doesn't help me, because I'm not trying to fix the above query, I'm ADDING the above to an already complex query.

A: 

Maybe your example is too simplified, but I'd use a group by:

SELECT
  a.user_id 
FROM 
  table1 a
    LEFT OUTER JOIN table2 b ON (a.user_id = b.user_id)
GROUP BY
  a.user_id

I fear the only other way would be using nested queries:

The difference between this query and your example is a 'sub table' is only generated once, however in your example you generate a 'sub table' for each row in table1 (but may depend on the compiler, so you might want to use query analyser to check performance).

SELECT
  a.user_id,
  b.sub_id
FROM 
  table1 a
    LEFT OUTER JOIN (
      SELECT
        user_id,
        min(sub_id) as sub_id,
      FROM
        table2
      GROUP BY
        user_id
    ) b ON (a.user_id = b.user_id)

Also, if your query is getting quite complex I'd use temporary tables to simplify the code, it might cost a little more in processing time, but will make your queries much easier to maintain.

A Temp Table example would be:

SELECT
  user_id
INTO
  #table1
FROM 
  table1
WHERE
  .....

SELECT
  a.user_id,
  min(b.sub_id) as sub_id,
INTO
  #table2
FROM
  #table1 a
    INNER JOIN table2 b ON (a.user_id = b.user_id)
GROUP BY
  a.user_id

SELECT
  a.*,
  b.sub_id
from
  #table1 a
    LEFT OUTER JOIN #table2 b ON (a.user_id = b.user_id)
Ady
I can't turn the entire query into an aggregate query.
stu
Maybe you should post your actual query so people can give you helpful answers. This was a good answer based on what you posted.
Dave Costa
A: 

How about:

select a.user_id 
from table1 a
where exists (select null from table2 b 
              where a.user_id = b.user_id 
             )
Tony Andrews
+1  A: 

In MySql you can ensure that any query returns at most X rows using

select *
from foo
where bar = 1
limit X;

Unfortunately, I'm fairly sure this is a MySQL-specific extension to SQL. However, a Google search for something like "mysql sybase limit" might turn up an equivalent for Sybase.

Don
A: 

First of all, I believe the query you are trying to write as your example is:

select a.user_id 
from table1 a, table2 b 
where a.user_id = b.user_id 
and b.sub_id = (select min(c.sub_id) 
                from table2 c 
                where b.user_id = c.user_id)

Except you wanted an outer join (which I think someone edited out the Oracle syntax).

select a.user_id 
from table1 a
left outer join table2 b on a.user_id = b.user_id 
where b.sub_id = (select min(c.sub_id) 
                from table2 c 
                where b.user_id = c.user_id)
James Curran
Yeah, it kinda sucks when somebody edits your question and changes the meaning.
stu
+1  A: 

A few quick points:

  1. You need to have definitive business rules. If the query returns more than one row then you need to think about why (beyond just "it's a 1:many relationship - WHY is it a 1:many relationship?). You should come up with the business solution rather than just use "min" because it gives you 1 row. The business solution might simply be "take the first one", in which case min might be the answer, but you need to make sure that's a conscious decision.
  2. You should really try to use the ANSI syntax for joins. Not just because it's standard, but because the syntax that you have isn't really doing what you think it's doing (it's not an outer join) and some things are simply impossible to do with the syntax that you have.

Assuming that you end up using the MIN solution, here's one possible solution without the subquery. You should test it with various other solutions to make sure that they are equivalent in outcome and to see which performs the best.

SELECT
     a.user_id, b.*
FROM
     dbo.Table_1 a
LEFT OUTER JOIN dbo.Table_2 b ON b.user_id = a.user_id AND b.sub_id = a.sub_id
LEFT OUTER JOIN dbo.Table_2 c ON c.user_id = a.user_id AND c.sub_id < b.sub_id
WHERE
     c.user_id IS NULL

You'll need to test this to see if it's really giving what you want and you might need to tweak it, but the basic idea is to use the second LEFT OUTER JOIN to ensure that there are no rows that exist with a lower sub_id than the one found in the first LEFT OUTER JOIN (if any is found). You can adjust the criteria in the second LEFT OUTER JOIN depending on the final business rules.

Tom H.
Now, see, that's what I was looking for. Thanks.As for business rules, here's the business: the database and schema are 8+ years old. My new application that needs to get one row from a one to many table isn't going to be the one that is going to get the schema and all other applications changed.
stu
A: 

Well, you already have a query that works. If you are concerned about the speed you could

  • Add a field to table2 which identifies which sub_id is the 'first one' or

  • Keep track of table2's primary key in table1, or in another table

neonski